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CLASSIFICATION AND PROGNOSIS PREDICTION OF ACUTE 
LYMPHOBLASTIC LEUKEMIA BY GENE EXPRESSION PROFILING 



FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
This research underlying this invention was supported in part with funds from 
National Institutes of Health grants P01 CA71907-06, CA51001, CA36401, 
CA78224, Cancer Center CORE Grant CA-21765, and National Science Foundation 
grant EIA-0074S69. The United States Government may have an interest in the 
subject matter of the invention. 

BACKGROUND OF THE INVENTION 
Pediatric acute lymphoblastic leukemia (ALL) is one of the great success 
stories of modern cancer therapy, with contemporary treatment protocols achieving 
overall long-term event free survival rates approaching 80% (Schrappe et al. (2000) 
Blood 95:3310-22; Silverman et a/.(2001) Blood 97:121 1-18; and Pui and Evans 
(1998) N. Eng. J. Med. 339:605-1 5). This success has been achieved in part by using 
risk-adapted therapy that involves tailoring the intensity of treatment to each patient's 
risk of relapse. This approach was developed following the realization that pediatric 
ALL is a heterogeneous disease consisting of various leukemia subtypes that differ 
markedly in their response to chemotherapy (reviewed in Pui and Evans (1998) N. 
Eng. J. Med. 339:605-15). By tailoring the intensity of treatment to a patient's 
relative risk of relapse, patients are neither under-treated or over-treated, and are thus 
afforded the highest chance for a cure. 

Critical to the success of this approach has been the accurate assignment of 
individual patients to specific risk groups. Although risk assignment is influenced by 
a variety of clinical and laboratory parameters, the genetic alterations that underlie the 
pathogenesis of individual leukemia subtypes figure prominently in most 

classification schemes (Silverman LB et al. (2001) Blood 97:121 1-18; and Pui and 
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Evans (1998) N. Engl. J. Med. 339:605-15). Through systematic immunophenotyping 
and cytogenetic analysis, and the subsequent molecular cloning of the genes targeted 
by the identified chromosomal rearrangements, a number of genetically distinct 
leukemia subtypes have been defined. These include B-lineage leukemias that 
contain t(9;22)[BCR-ABL], t(l;19)[E2A-PBXl], t(12;21)[TEL-AMLl], 
rearrangements in the MLL gene on chromosome 11, band q23, or a hyperdiploid 
karyotype (i.e., >50 chromosomes), and T-lineage leukemias (T-ALL) (Silverman et 
<z/.(2001) Blood 97:1211-18; and Pui and Evans (1998) N. Eng. J. Med. 339:605-15). 
The underlying genetic lesions in these leukemia subtypes influence the response to 
cytotoxic drugs. For example, leukemias that express the E2A-PBX1 fusion protein 
respond poorly to conventional antimetabolite-based treatment, but have cure rates 
approaching 80% when treated with more intensive therapies (Raimondi et al. (1990) 
j ciin. Oncol. 8:1380-88; and Hunger (1996) Blood 87:1211-1224). Similarly, BCR- 
ABL expressing ALLs, or infants with MLL rearrangements have exceedingly poor 
cure rates with conventional chemotherapy, and allogeneic hematopoietic stem cell 
transplantation with HLA matched sibling donor has already been shown to improve 
outcome for patients with the former leukemia subtype (Pui et al. (1991) Blood 
77:440-46; Heerema et al. (1999) Leukemia 13:679-86; Arico et al. (2000) N. Engl. J. 
Med. 342:998-1006; and Biondi et al. (2000) Blood 96:24-33). 

Unfortunately, the accurate assignment of patients to specific risk groups is a 
difficult and expensive process, requiring intensive laboratory studies including 
immunophenotyping, cytogenetics, and molecular diagnostics (Pui and Evans (1998; 
N. Eng. J. Med. 339:605-15; and Pui et al. (2001) Lancet Oncology 2:597-607). 
Moreover, these diagnostic approaches require the collective expertise of a number of 
professionals, and although this expertise is available at most major medical centers, it 
is generally unavailable in developing countries. Accordingly, there remains a need 
for rapid, less expensive methods of assigning patients affected by ALL into known 
leukemia risk groups and identifying patients for whom there is a high risk that 
conventional therapeutic approaches will fail. 

BRIEF SUMMARY OF THE INVENTION 
The present invention provides methods and compositions useful for 
diagnosing and choosing treatment for subjects affected by leukemia. The claimed 
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methods include methods of assigning a subject affected by leukemia to a leukemia 
risk group, methods of predicting whether a subject affected by leukemia has an 
increased risk of relapse, methods of predicting whether a subject affected by 
leukemia has an increased risk of developing secondary acute myeloid leukemia 
5 (AML), methods to aid in the determination of a prognosis for a subject affected by 
leukemia, methods of choosing a therapy for a subject affected by leukemia, and 
methods of monitoring the disease state in a subject undergoing one or more therapies 
for leukemia. Methods of screening test compounds to identify therapeutic 
compounds useful for the treatment of leukemia and molecular targets for these 
1 0 therapeutic compounds are also provided. 

The claimed methods comprise providing an expression profile of a sample 
from a subject affected by leukemia and comparing this subject expression profile to 
one or more reference expression profiles. In one embodiment, the reference profiles 
are associated with leukemia risk groups, and the subject expression profile is 
15 compared to one or more of these risk group reference profiles to thereby assign the 
subject affected by leukemia to a leukemia risk group. In another embodiment, one or 
more reference profiles are associated with relapse of leukemia and the subject 
expression profile is compared to one or more of these relapse reference profiles to 
determine if the subject has an increased risk of relapse. In yet another embodiment, 
20 one or more reference profiles are associated with secondary AML, and the subject 

expression profile is compared to one or more of these reference profiles to determine 
whether the subject has an increased risk of developing secondary AML. 

The present invention also provides compositions useful for diagnosing and 
choosing a therapy for subjects affected by leukemia. These compositions include 
25 arrays comprising a plurality of capture probes that can bind specifically to nucleic 
acid molecules that are differentially expressed in leukemia risk groups, in leukemia 
subjects who have relapsed, or in leukemia subjects who have developed secondary 
AML. Also provided is a computer-readable medium comprising digitally-encoded 
expression profiles comprising values representing the expression levels of genes that 
30 are differentially expressed in leukemia risk groups, in leukemia subjects who have 
relapsed, or in leukemia subjects who have developed secondary AML. Additional 
compositions of the invention include kits comprising an array of capture probes that 
can bind specifically to nucleic acid molecules that are differentially expressed in 

-3- 



BNSDOCID: <WO 030831 40A2_L> 



WO 03/083140 PCT/US03/08486 

leukemia risk groups, in leukemia subjects who have relapsed, or in leukemia subjects 
who have developed secondary AML, and a computer-readable medium having 
digitally encoded expression profiles with values representing the expression level of 
a nucleic acid molecule detected by the array. 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention provides a single platform, expression analysis, that can 
accurately identify each of the known prognostically and therapeutically relevant 
subgroups of leukemia and predict the risk of relapse and the risk of secondary 
(therapy-induced) AML in patients having leukemia. The methods and compositions 
of the invention provide tools useful in choosing a therapy for leukemia patients, 
including methods for assigning a leukemia patient to a leukemia risk group, methods 
of predicting whether a leukemia patient has an increased risk of relapse, methods of 
predicting whether a leukemia patient has an increased risk of developing secondary 
(therapy-induced) AML, methods of choosing a therapy for a leukemia patient, 
methods of determining the efficacy of a therapy in a leukemia patient, and methods 
of determining the prognosis for a leukemia patient. 

The methods of the invention comprise the steps of providing an expression 
profile from a sample from a subject affected by leukemia and comparing this subject 
expression profile to one or more reference profiles that are associated with a 
particular physiologic condition, such as a leukemia risk group, the occurrence of 
relapse, or the development of secondary AML. By identifying the leukemia risk 
group reference profile that is most similar to the subject expression profile, the 
subject can be assigned to a leukemia risk group. Similarly, the risk that a subject 
affected by leukemia will relapse or develop secondary AML can be predicted by 
determining whether the expression profile from the subject is sufficiently similar to a 
reference profile associated with relapse or a reference profile associated with the 
development of secondary AML. 

In another embodiment, the subject expression profile is from a subject affected by 
leukemia who is undergoing a therapy to treat the leukemia. The subject expression 
profile is compared to one or more reference expression profiles of the invention to 
monitor the efficacy of the therapy. 
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Expression Profiles 

As used herein, an "expression profile" comprises one or more values 
corresponding to a measurement of the relative abundance of a gene expression 
product. Such values may include measurements of RNA levels or protein 
abundance. Thus, the expression profile can comprise values representing the 
measurement of the transcriptional state or the translational state of the gene. See, 
U.S. Pat. Nos. 6,040,138, 5,800,992, 6,020135, 6,344,316, and 6,033,860, which are 
hereby incorporated by reference in their entireties. 

The transcriptional state of a sample includes the identities and relative 
abundance of the RNA species, especially mRNAs present in the sample. Preferably, 
a substantial fraction of all constituent RNA species in the sample are measured, but 
at least a sufficient fraction to characterize the transcriptional state of the sample is 
measured. The transcriptional state can be conveniently determined by measuring 
transcript abundance by any of several existing gene expression technologies. 

Translational state includes the identities and relative abundance of the 
constituent protein species in the sample. As is known to those of skill in the art, the 
transcriptional state and translational state are related. 

In some embodiments, the expression profiles of the present invention are 
generated from samples from subjects affected by leukemia, including subjects having 
leukemia, subjects suspected of having leukemia, subjects having a propensity to 
develop leukemia, or subjects who have previously had leukemia, or subjects 
undergoing therapy for leukemia. The samples from the subject used to generate the 
expression profiles of the present invention can be derived from a variety of sources 
including, but not limited to, single cells, a collection of cells, tissue, cell culture, 
bone marrow, blood, or other bodily fluids. The tissue or cell source may include a 
tissue biopsy sample, a cell sorted population, cell culture, or a single cell. Sources 
for the sample of the present invention include cells from peripheral blood or bone 
marrow, such as blast cells from peripheral blood or bone marrow. 

In selecting a sample, the percentage of the sample that constitutes cells 
having differential gene expression in leukemia risk groups, relapse, or secondary 
AML should be considered. Samples may comprise at least 20%, at least 30%, at 
least 40%, at least 50%, at least 55%, at least 60%, at least 70%, at least 75%, at least 
80%, at least S5%, at least 90%, or at least 95% cells having differential expression in 

-5- 



030831 40A2_I_> 



WO 03/08314(1 PCT/US03/08486 

leukemia risk groups, relapse, or secondary AML, with a preference for samples 
having a higher percentage of such cells, hi some embodiments, these cells are blast 
cells, such as leukemic cells. The percentage of a sample that constitutes blast cells 
may be determined by methods well known in the art; see, for example, the methods 
described elsewhere herein. 

In some embodiments of the present invention, the expression profiles 
comprise values representing the expression levels of genes that are differentially 
expressed in leukemia risk groups, in subjects affected by leukemia who have 
relapsed, or in subjects affected by leukemia who have developed secondary AML. 
The term "differentially expressed" as used herein means that the measurement of a 
cellular constituent varies in two or more samples. The cellular constituent may be 
upregulated in a sample from a subject having one physiologic condition in 
comparison with a sample from a subject having a different physiologic condition, or 
down regulated in a sample from a subject having one physiologic condition in 
comparison with a sample from a subject having a different physiologic condition. 
For example, in one embodiment, the differentially expressed genes of the present 
invention may be expressed at different levels in different leukemia risk groups. In 
another embodiment, the differentially expressed genes are expressed in different 
levels in subjects affected by leukemia who will relapse after conventional treatment 
in comparison with subjects affected by leukemia who will not relapse and thus will 
remain in continuous complete remission. In yet another embodiment, the 
differentially expressed genes are expressed in different levels in subjects affected by 
leukemia who will develop secondary AML in comparison with subjects affected by 
leukemia who will not develop secondary AML. 

The present invention provides groups of genes that are differentially 
expressed in diagnostic leukemia samples of patients in different risk groups, or in 
patients that go on to develop a relapse or a therapy induced (secondary) AML. Some 
of these genes were identified based on gene expression levels for 12,600 probes in 
360 leukemia samples. Values representing the expression levels of the nucleic acid 
molecules detected by the probes were analyzed using five different statistical metrics 
to identify genes that were differentially expressed in leukemia risk groups. The 
methods used to analyze the expression level values to identify differentially 
expressed genes were the Chi-square statistics method, the Correlation-based Feature 
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Selection method, the T-statistics method, the Wilkins' method, and the self- 
organizing map and discriminant analysis with variance metric. Although different 
methods of analysis resulted in the selection of different groups of differentially 
expressed genes, the genes selected by each method could be used to create an 
expression profile that could accurately determine whether a leukemia patient should 
be assigned to a risk group, with an overall diagnostic accuracy of about 96%. See, 

the Experimental section. 

Additional genes that are differentially expressed in diagnostic leukemia 
samples were identified based on gene expression levels for 26,825 probes in a subset 
of 132 leukemia samples selected from the 360 leukemia samples described above. A 
chi-squared metric followed by permutation test was used to identify discriminating 
genes for the T-ALL, E2A-PBX1, TEL-AML1, BCR-ABL, MLL rearrangement, and 
Hyperdiploid>50 chromosomes. Genes whose expression is limited to a single B-cell 
lineage were also identified, and are provided in Tables 70-74. 

Thus, distinct sets of differentially expressed genes that can be used to 
distinguish the T-lineage, hyperdiploid >50 chromosomes, BCR-ABL, E2A-PBX1, 
TEL-AML1, and MLL gene rearrangement risk groups are provided. Examples of 
genes that are differentially expressed in the T-ALL risk group are shown in Tables 7, 
14, 21, 28, 35, 59, and 67. Examples of genes that are differentially expressed in the 
E2A-PBX1 risk group are shown in Tables 3, 10, 17, 24, 31, 55, 64, and 71. 
Examples of genes that are differentially expressed in the TEL-AML1 risk group are 
shown in Tables 8, 15, 22, 29, 36, 60, 68, and 74. Examples of genes that are 
differentially expressed in the BCR-ABL risk group are shown in Tables 2, 9, 16, 23, 
30, 54, 63, and 70. Examples of genes that are differentially expressed in the MLL 
risk group are shown in Tables 5, 12, 19, 26, 33, 57, 66, and 73. Examples of genes 
that are differentially expressed in the Hyperdiploid >50 risk group are shown in 
Tables 4, 1 1, 18, 25, 32, 56, 65, and 72. ■ 

The present invention further provides a seventh leukemia risk group, herein 
termed "Novel," that can be distinguished from the previously-described leukemia 
risk groups based on expression profiling. The expression profiles from subjects in 
the Novel risk group are distinguishable from those of the T-ALL, E2A-PBX1, TEL- 
AML1, BCR-ABL, MLL, and Hyperdiploid >50 risk groups. Subjects assigned to the 
Novel risk group have similar expression profiles. Examples of genes that are 
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differentially expressed in the Novel leukemia risk group are shown in Tables 4, 11, 
18, 25,32, and 58. 

Similarly, sets of differentially expressed genes associated with leukemia 
patients in the T-ALL, Hyperdiploid >50, TEL-AML1, MIX, and Other (i.e. not the 
5 T-ALL, hyperdiploid >50, TEL-AML1, MLL, E2A-PBX1, or BCR-ABL) risk groups 
who have undergone relapse were identified. Examples of differentially expressed 
genes associated with relapse in subjects in the T-ALL risk group are shown in Table 
44. Examples of differentially expressed genes associated with relapse in subjects in 
the hyperdiploid >50 risk group are shown in Table 45. Examples of differentially 
10 expressed genes associated with relapse in subjects in the TEL-AML1 risk group are 
shown in Table 46. Examples of differentially expressed genes associated with relapse 
in subjects in the MLL risk group are shown in Table 47. Examples of differentially 
expressed genes associated with relapse in subjects in the E2A-PBX1, BCR-ABL, and 
Novel risk group are shown in Table 48. 
1 5 The invention also provides genes that are differentially expressed in subjects 

affected by TEL-AML1 who have developed secondary (treatment-induced) AML. 
Examples of such genes are shown in Table 52. 

The present invention also reveals genes with a high differential level of 
expression in leukemic compared to normal cells. These highly differentially 
20 expressed genes are selected from the genes shown in Tables 2-36 and 44-48, 63-68, 
and 70-74. These genes and their expression products are useful as markers to detect 
the presence of minimal residual disease (MRD) in a patient. Antibodies or other 
reagents or tools may be used to detect the presence of these telltale markers of MRD. 
The expression profiles of the invention comprise one or more values 
25 representing the expression level of a gene having differential expression in a 
leukemia risk group, in subjects affected by leukemia who will relapse after 
conventional therapy, or in subjects affected by leukemia who will develop secondary 
AML after conventional therapy. Each expression profile contains a sufficient 
number of values such that the profile can be used to distinguish one leukemia risk 
30 group from another, or to distinguish subjects who will relapse after conventional 
therapy from those who will not relapse, or to distinguish subjects who will develop 
secondary AML after conventional therapy from those who will not develop 
secondary AML. In some embodiments, the expression profiles comprise only one 
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value. For example, it can be determined whether a subject affected by leukemia is in 
the T-ALL risk group based only on the expression level of the CD3D antigen (NCBI 
Accession No. AA919102; see Table 14). Similarly, it can be determined whether a 
subject affected by leukemia is in the E2A-PBX1 risk group based only on the 
expression level of the cDNA of NCBI Accession No. AL049381 (see Table 10). In 
other embodiments, the expression profile comprises more than one value 
corresponding to a differentially expressed gene, for example at least 2 values, at least 
3 values, at least 4 values, at least 5 values, at least 6 values, at least 7 values, at least 
8 values, at least 9 values, at least 10 values, at least 1 1 values, at least 12 values, at 
least 13 values, at least 14 values, at least 15 values, at least 16 values, at least 17 
values, at least 18 values, at least 19 values, at least 20 values, at least 22 values, at 
least 25 values, at least 27 values, at least 30 values, at least 35 values , at least 40 
values, at least 45 values, at least 50 values, at least 75 values, at least 100 values, at 
least 125 values, at least 150 values, at least 175 values, at least 200 values, at least 
250 values, at least 300 values, at least 400 values, at least 500 values, at least 600 
values, at least 700 values, at least 800 values, at least 900 values, at least 1000 
values, at least 1200 values, at least 1500 values, or at least 2000 or more values. 

It is recognized that the diagnostic accuracy of assigning a subject to a 
leukemia risk group, determining whether a subject has an increased risk for relapse, 
or determining whether a subject has an increased risk of developing secondary AML 
will vary based on the number of values contained in the expression profile. 
Generally, the number of values contained in the expression profile is selected such 
that the diagnostic accuracy is at least 85%, at least 87%, at least 90%, at least 91%, at 
least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 
98%, or at least 99%, as calculated using methods described elsewhere herein, with an 
obvious preference for higher percentages of diagnostic accuracy. 

It is recognized that the diagnostic accuracy of assigning a subj ect to a 
leukemia risk group, determining whether a subject has an increased risk for relapse, 
or determining whether a subject has an increased risk of developing secondary AML 
will vary based on the strength of the correlation between the expression levels of the 
differentially expressed genes and the associated physiologic condition. When the 
values in the expression profiles represent the expression levels of genes whose 
expression is strongly correlated with the physiologic condition, it may be possible to 
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use fewer number of values in the expression profile and still obtain an acceptable 
level of diagnostic or prognostic accuracy. 

The strength of the correlation between the expression level of a differentially 
expressed gene and the presence or absence of a particular physiologic state may be 
5 determined by a statistical test of significance. For example, the chi square test used 
to select genes in some embodiments of the present invention assigns a chi square 
value to each differentially expressed gene, indicating the strength of the correlation 
of the expression of that gene and the presence or absence of the associated 
physiologic condition. Similarly, the T-statistics metric and the Wilkins' metric both 
10 provide a value or score indicative of the strength of the correlation between the 
expression of the gene and the absence or presence of the associated physiologic 
conditions. These scores may be used to select the genes whose expression levels 
have the greatest correlation with a particular physiologic state in order to increase the 
diagnostic or prognostic accuracy of the methods of the invention, or in order to 
15 reduce the number of values contained in the expression profile while maintaining the 
diagnostic or prognostic accuracy of the expression profile. 

For example, in one embodiment the chi square test is used to determine the 
significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes having a chi square value of more than 20, 
20 more than 25, more than 30, more than 35, more than 40, more than 45, more than 50, 
more than 55, more than 60, more than 65, more than 70, more than 75, more than 80, 
more than 90, more than 100, more than 120, more than 140, more than 160, more 
than 180, or more than 200 are selected. 

hi another embodiment, the T-statistics metric is used to determine the 
25 significance of the differentially expressed genes whose expression levels are 

included in the array, and only those genes with a score having an absolute value of 
greater than 4, greater than 5, greater -than 6, greater than 7, greater than 8, greater 
than 9, greater than 10, greater than 12, greater than 25, greater than 27, greater than 
30, or greater than 35 are selected. 
30 In yet another embodiment, the Wilkins' metric is used to determine the 

significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes having a score of greater than 0.55, greater 
than 0.57, greater than 0.59, greater than 0.61, greater than 0.63, greater than 0.65, 
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greater than 0.67, greater than 0.69, greater than 0.71, greater than 0.73, greater than 
0.75, greater than 0.77, greater than 0.79, greater than 0.81, greater than 0.83, or 
greater than 0.85 are selected. 

Each value in the expression profiles of the invention is a measurement i 
representing the absolute or the relative expression level of a differentially expressed 
genes. The expression levels of these genes may be determined by any method 
known in the art for assessing the expression level of an RNA or protein molecule in a 
sample. For example, expression levels of RNA may be monitored using a membrane 
blot (such as used in hybridization analysis such as Northern, Southern, dot, and the 
like), or microwells, sample tubes, gels, beads or fibers (or any solid support 
comprising bound nucleic acids). See U.S. Patent Nos. 5,770,722, 5,874,219, 
5,744,305, 5,677,195 and 5,445,934, which are expressly incorporated herein by 
reference. The gene expression monitoring system may also comprise nucleic acid 
probes in solution. 

In one embodiment of the invention, microarrays are used to measure the 
values to be included in the expression profiles. Microarrays are particularly well 
suited for this purpose because of the reproducibility between different experiments. 
DNA microarrays provide one method for the simultaneous measurement of the 
expression levels of large numbers of genes. Each array consists of a reproducible 
pattern of capture probes attached to a solid support. Labeled RNA or DNA is 
hybridized to complementary probes on the array and then detected by laser scanning. 
Hybridization intensities for each probe on the array are determined and converted to 
a quantitative value representing relative gene expression levels. See, the 
Experimental section. See also, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 
6,033,860, and 6,344,316, which are incorporated herein by reference. High-density 
oligonucleotide arrays are particularly useful for determining the gene expression 
profile for a large number of RNA's in a sample. 

In one approach, total mRNA isolated from the sample is converted to labeled 
cRNA and then hybridized to an oligonucleotide array. Each sample is hybridized to 
a separate array. Relative transcript levels are calculated by reference to appropriate 
controls present on the array and in the sample. See, for example, the Experimental 
section. 
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In another embodiment, the values in the expression profile are obtained by 
measuring the abundance of the protein products of the differentially-expressed genes. 
The abundance of these protein products can be determined, for example, using 
antibodies specific for die protein products of the differentially-expressed genes. The 
5 term "antibody" as used herein refers to an immunoglobulin molecule or 

immunologically active portion thereof, i.e., an antigen-binding portion. Examples of 
immunologically active portions of immunoglobulin molecules include F(ab) and 
F(ab')2 fragments which can be generated by treating the antibody with an enzyme 
such as pepsin. 

10 The antibody can be a polyclonal, monoclonal, recombinant, e.g., a chimeric 

or humanized, fully human, non-human, e.g., murine, or single chain antibody. In a 
preferred embodiment it has effector function and can fix complement. The antibody 
can be coupled to a toxin or imaging agent. 

A full-length protein product from a differentially-expressed gene, or an 
1 5 antigenic peptide fragment of the protein product can be used as an immunogen. 
Preferred epitopes encompassed by the antigenic peptide are regions of the protein 
product of the differentially expressed gene that are located on the surface of the 
protein, e.g., hydrophilic regions, as well as regions with high antigenicity. The 
antibody can be used to detect the protein product of the differentially expressed gene 
20 in order to evaluate the abundance and pattern of expression of the protein. These 

antibodies can also be used diagnostically to monitor protein levels in tissue as part of 
a clinical testing procedure, e.g., to, for example, determine the efficacy of a given, 
therapy. Detection can be facilitated by coupling (i.e., physically linking) the 
antibody to a detectable substance (i.e., antibody labeling). Examples of detectable 
25 substances include various enzymes, prosthetic groups, fluorescent materials, 

luminescent materials, bioluminescent materials, and radioactive materials. Examples 
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, p- 
galactosidase, or acetylcholinesterase; examples of suitable prosthetic group 
complexes include streptavidin/biotin and avidin/biotin; examples of suitable 
30 fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, 
rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an 
example of a luminescent material includes luminol; examples of bioluminescent 
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materials include luciferase, luciferin, and aequorin, and examples of suitable 

radioactive material include 125 I, 131 I, 35 S or 3 H. 

Once the values comprised in the subject expression profile and the reference 
expression profile or expression profiles are established, the subject profile is 
compared to the reference profile to determine whether the subject expression profile 
is sufficiently similar to the reference profile. Alternatively, the subject expression 
profile is compared to a plurality of reference expression profiles to select the 
reference expression profile that is most similar to the subject expression profile. 

Any method known in the art for comparing two or more data sets to detect 

similarity between them may be used to compare the subject expression profile to the 

reference expression profiles. In some embodiments, the subject expression profile 

and the reference profile are compared using a supervised learning algorithm such as 

the support vector machine (S VM) algorithm, prediction by collective likelihood of 

emerging patterns (PCL) algorithm, the ^-nearest neighbor algorithm, or the Artificial 

Neural Network algorithm. Each of these algorithms is described in the Experimental 

section of the application. To determine whether a subject expression profile shows 

"statistically significant similarity" or "sufficient similarity" to a reference profile, 

statistical tests may be performed to determine whether the similarity between the 

subject expression profile and the reference expression profile is likely to have been 

achieved by a random event. An example of such a statistical test is the permutation 

test described in the Experimental section; however, any statistical test that can 

calculate the likelihood that the similarity between the subject expression profile and 

the reference profile results from a random event can be used. The accuracy of 

assigning a subject to a risk group based on similarity between an expression profile 

for the subject and an expression profile for the risk group depends in part on the 

degree of similarity between the two profiles. Therefore, when more accurate 

diagnoses are required, the stringency with which the similarity between the subject 

expression profile and the reference profile is evaluated should be increased. For 

example, in various embodiments, the p-value obtained when comparing the subject 

expression profile to a reference profile that shares sufficient similarity with the 

subject expression profile is less than 0.20, less than 0.15, less than 0.10, less than 

0.09, less than 0.08, less than 0.07, less than 0.06, less than 0.05, less than 0.04, less 

than 0.03, less than 0.02, or less than 0.01. 
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In some embodiments, the assignment of a subject affected by leukemia to a 
leukemia risk group, the prediction of whether a subject affected by leukemia has an 
increased risk of relapse, or the prediction of whether a subject by affected by 
leukemia has an increased risk of developing secondary AML is used in a method of 

5 choosing a therapy for the subject affected by leukemia. A therapy, as used herein, 
refers to a course of treatment intended to reduce or eliminate the affects or symptoms 
of a disease, in this case leukemia. A therapy regiment will typically comprise, but is 
not limited to, a prescribed dosage of one or more drugs or hematopoietic stem cell 
transplantation. Therapies, ideally, will be beneficial and reduce the disease state but 

1 0 in many instances the effect of a therapy will have non-desirable effects as well. 
Thus, the methods of the invention are useful for monitoring the effectiveness of a 
therapy even when non-desirable side-effects are observed. 

Arrays, Computer-Readable Medium, and Kits 

1 5 The present invention provides compositions that are useful in determining the 

gene expression profile for a subject affected by leukemia and selecting a reference 
profile that is similar to the subject expression profile. These compositions include 
arrays comprising a substrate having a capture probes that can bind specifically to 
nucleic acid molecules that are differentially expressed in leukemia risk groups, 

20 subjects affected by leukemia who will relapse after conventional therapy, or subjects 
affected by leukemia who will develop secondary AML after conventional therapy. 
Also provided is a computer-readable medium having digitally encoded reference 
profiles useful in the methods of the claimed invention. The invention also 
encompasses kits comprising an array of the invention and a computer-readable 

25 medium having digitally-encoded reference profiles with values representing the 

expression of nucleic acid molecules detected by the arrays. These kits are useful for 
assigning a subject affected by leukemia to a leukemia risk group, predicting whether 
a subject affected by leukemia has an increased risk of relapse, and predicting whether 
a subject affected by leukemia has an increased risk of developing secondary AML. 



30 



The present invention provides arrays comprising capture probes for detecting 
the differentially expressed genes of the invention. By "array" is intended a solid 
support or substrate with peptide or nucleic acid probes attached to said support or 
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substrate. Arrays typically comprise a plurality of different nucleic acid or peptide 
capture probes that are coupled to a surface of a substrate in different, known 
locations. These arrays, also described as "microarrays" or colloquially "chips" have 
been generally described in the art, for example, in U.S. Patent. Nos. 5,143,S54, 

5 5,445,934, 5,744,305, 5,677,195, 6,040,193, 5,424,186, 6,329,143, and 6,309,831 and 
Fodor et al (1991) Science 251:161-11, each of which is incorporated by reference in 
its entirety. These arrays may generally be produced using mechanical synthesis 
methods or light directed synthesis methods which incorporate a combination of 
photolithographic methods and solid phase synthesis methods. 

1 0 Techniques for the synthesis of these arrays using mechanical synthesis 

methods are described in, e.g., U.S. Patent No. 5,384,261, incorporated herein by 
reference in its entirety for all purposes. Although a planar array surface is preferred, 
the array may be fabricated on a surface of virtually any shape or even a multiplicity 
of surfaces. Arrays may be peptides or nucleic acids on beads, gels, polymeric 

15 surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. 
Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, each of which is 
hereby incorporated in its entirety for all purposes. Arrays may be packaged in such a 
manner as to allow for diagnostics or other manipulation of an all-inclusive device. 
See, for example, U.S. Pat. Nos. 5,856,174 and 5,922,591 herein incorporated by 

20 reference. 

The arrays provided by the present invention comprise capture probes that can 
specifically bind a nucleic acid molecule that is differentially expressed in leukemia 
risk groups, a nucleic acid molecule that is differentially expressed in subjects 
affected by leukemia who will relapse after conventional therapy, or a nucleic acid 
25 molecule that is differentially expressed in subjects affected by leukemia who will 
develop secondary AML after conventional therapy. These arrays can be used to 
measure the expression levels of nucleic acid molecules to thereby create an 
expression profile for use in methods of determining the diagnosis and prognosis for 
leukemia patients, and for monitoring the efficacy of a therapy in these patients as 
30 described elsewhere herein. 

In some embodiments, each capture probe in the array detects a nucleic acid 
molecule selected from the nucleic acid molecules designated in Tables 2-36, 44-49, 
52, 54-60, 63-6S, and 70-74. The designated nucleic acid molecules include those 
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differentially expressed in leukemia risk groups selected from the T-ALL risk group 
(Tables 7, 14, 21, 28, 35, 59, and 67); E2A-PBX1 risk group (Tables 3, 10, 17, 24, 31, 
55, 64, and 71), TEL-AML1 risk group (Tables 8, 15, 22, 29, 36, and 60, 68, and 74), 
BCR-ABL risk group (Tables 2, 9, 16, 23, 30, 54, 63, and 70), MLL risk group 

5 (Tables 5, 12, 19, 26, 33, 57, 66, and 73), Hyperdiploid >50 risk group (Tables 4, 11, 
18, 25, 32, 56, 65, and 72), and Novel risk group (Tables 6, 13, 20, 27, 34, and 58), 
those differentially expressed in subjects affected by leukemia who will relapse after 
conventional therapy (Tables 44-48), and those differentially expressed in subjects 
affected by TEL-AML1 who will develop secondary AML after conventional therapy 

10 (Table 52). 

The arrays of the invention comprise a substrate have a plurality of addresses, 
where each addresses has a capture probe that can specifically bind a target nucleic 
acid molecule. The number of addresses on the substrate varies with the purpose for 
which the array is intended. The arrays may be low-density arrays or high-density 
1 5 arrays and may contain 4 or more, 8 or more, 1 2 or more, 1 6 or more, 20 or more, 24 
or more, 32 or more, 48 or more, 64 or more, 72 or more 80 or more, 96, or more 
addresses, or 192 or more, 288 or more, 384 or more, 768 or more, 1536 or more, 
3072 or more, 6144 or more, 9216 or more, 12288 or more, 15360 or more, or 18432 
or more addresses. In some embodiments, the substrate has no more than 12, 24, 48, 
20 96, or 192, or 384 addresses, no more than 500, 600, 700, 800, or 900 addresses, or no 
more than 1000, 1200, 1600, 2400, or 3600 addressees. 

The invention also provides a computer-readable medium comprising one or_, 
more digitally-encoded expression profiles, where each profile has one or more values 
representing the expression of a gene that is differentially expressed in a leukemia risk 
25 group, the expression level of a gene that is differentially expressed in subjects 

affected by leukemia who will relapse after conventional therapy, or the expression 
level of a gene that is differentially expressed in subjects affected by leukemia who 
will develop secondary AML after conventional therapy. Such profiles are described 
elsewhere herein, fri some embodiments, the digitally-encoded expression profiles are 
30 comprised in a database. See, for example, U.S. Patent No. 6,308,1 70. 

The present invention also provides kits useful for diagnosing, treating, and 
monitoring the disease state in subjects affected by leukemia. These kits comprise an 
array and a computer readable medium. The array comprises a substrate having 
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addresses, where each address has a capture probe that can specifically bind a nucleic 
acid molecule that is differentially expressed in at least one leukemia risk group, in a 
subject affected by leukemia who will relapse after conventional therapy, or in a 
subject affected by leukemia who will develop secondary AML after conventional 
5 therapy. The results are converted into a computer-readable medium that has 

digitally-encoded expression profiles containing values representing the expression 
level of a nucleic acid molecule detected by the array. 

Methods of Screening and Therapeutic Targets 

10 The methods and compositions of the invention may be used to screen test 

compounds to identify therapeutic compounds useful for the treatment of leukemia. 
In one embodiment, the test compounds are screened in a sample comprising primary 
cells or a cell line representative of a particular leukemia risk group. After treatment 
with the test compound, the expression levels in the sample of one or more of the 

1 5 differentially-expressed genes of the invention are measured using methods described 
elsewhere herein. Values representing the expression levels of the differentially- 
expressed genes are used to generate a subject expression profile. This subject 
expression profde is then compared to a reference profile associated with the 
leukemia risk group represented by the sample to determine the similarity between the 

20 subject expression profile and the reference expression profile. Differences between 
the subject expression profile and the reference expression profile may be used to 
determine whether the test compound has anti-leukemogenic activity. 

The test compounds of the present invention can be obtained using any of the 
numerous approaches in combinatorial library methods known in the art, including: 

25 biological libraries; spatially addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring deconvolution; the 'one-bead one- 
compound' library method; and synthetic library methods using affinity 
chromatography selection. The biological library approach is limited to polypeptide 
libraries, while the other four approaches are applicable to polypeptide, non-peptide 

30 oligomer or small molecule libraries of compounds (Lam (1997) Anticancer Drug 
Des. 12:145). 

Examples of methods for the synthesis of molecular libraries can be found in 
the art, for example in DeWitt et al. (1993) Proc. Natl. Acad. Sci. USA 90:6909; Erb 
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et al. (1994) Proc. Natl. Acad. Sci. USA 91:11422; Zuckermann et a!. (1994). J. Med. 
Chem. 37:2678; Cho et al. (1993) Science 261:1303; Carell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2059; Carell et al. (1994) Angew. Chem. Int. Ed. Engl. 
33:2061; and in Gallop et al. (1994) J. Med. Chem. 37:1233. Libraries of compounds 

5 may be presented in solution (e.g., Houghten (1992) Biotechniques 13:412-421), or on 
beads (Lam (1991) Nature 354:82-84), chips (Fodor (1993) Nature 364:555-556), 
bacteria (U.S. Patent No. 5,223,409), spores (U.S. Patent No. 5,223,409), plasmids 
(Cull et al. (1992) Proc. Natl. Acad. Sci. USA 89:1865-1869) or on phage (Scott and 
Smith (1990) Science 249:386-390); (Devlin (1990) Science 249:404-406); (Cwirla et 

10 al. (1990) Proc. Natl. Acad. Sci. U.S.A. 97:6378-6382); (Felici (1991) J. Mol. Biol. 
222:301-310). 

Candidate compounds include, for example, 1) peptides such as soluble peptides, 
including Ig-tailed fusion peptides and members of random peptide libraries (see, e.g., 
Lam et al. (1991) Nature 354:82-84; Houghten et al. (1991) Nature 354:84-86) and 

1 5 combinatorial chemistry-derived molecular libraries made of D- and/or L- configuration 
amino acids; 2) phosphopeptides (e.g., members of random and partially degenerate, 
directed phosphopeptide libraries, see, e.g., Songyang et al. (1993) Cell 72:767-778); 3) 
antibodies (e.g., polyclonal, monoclonal, humanized, anti-idiotypic, chimeric, and single 
chain antibodies as well as Fab, F(ab') 2 , Fab expression library fragments, and epitope- 

20 binding fragments of antibodies); 4) small organic and inorganic molecules (e.g., 

molecules obtained from combinatorial and natural product libraries; 5) zinc analogs; 6) 
leukotriene A4 and derivatives; 7) classical aminopeptidase inhibitors and derivatives 
of such inhibitors, such as bestatin and arphamenine A and B and derivatives; 8) and 
artificial peptide substrates and other substrates, such as those disclosed herein above 

25 and derivatives thereof. 

The present invention discloses a number of genes that are differentially 
expressed in leukemia risk groups, in subjects affected by leukemia who will relapse 
after conventional therapy, or in subjects affected by leukemia who will develop 
secondary AML after conventional therapy. These differentially-expressed genes are 

30 shown in Tables 2-36 and 44-48, and 52. Because the expression of these genes is 
associated with leukemia risk factors, these genes may play a role in leukemogenesis. 
Accordingly, these genes and their gene products are potential therapeutic targets that 
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are useful in methods of screening test compounds to identify therapeutic compounds 
for the treatment of leukemia. 

The differentially-expressed genes of the invention may be used in cell-based 
screening assays involving recombinant host cells expressing the differentially- 
5 expressed gene product. The recombinant host cells are then screened to identify 
compounds that can activate the product of the differentially-expressed gene {i.e. 
agonists) or inactivate the product of the differentially-expressed gene (i.e. 
antagonists). 

Any of the leukemogenic functions mediated by the product of the differentially 
1 0 expressed gene may be used as an endpoint in the screening assay for identifying 

therapeutic compounds for the treatment of leukemia. Such endpoint assays include 
assays for cell proliferation, assays for modulation of the cell cycle, assays for the 
expression of markers indicative of leukemia, and assays for the expression level of 
genes differentially expressed in leukemia risk groups as described above. 
1 5 Modulators of the activity of a product of a differentially-expressed gene 

identified according to these drug screening assays provided above can be used to treat a 
subject with leukemia. These methods of treatment include the steps of administering 
the modulators of the activity of a product of a differentially-expressed gene in a 
pharmaceutical composition as described herein, to a subject in need of such treatment. 

20 

The following examples are offered by way of illustration and not by way of 
limitation. 



EXAMPLES 

25 EXAMPLE 1: 

To determine if gene expression profiling of leukemic cells could identify 
known biologic ALL subgroups, 327 diagnostic bone marrow (BM) samples were 
analyzed with AFFYMETRJX® oligonucleotide microarrays (Affymetrix Inc., Santa 
Clara, CA) containing 12,600 probe sets. 

30 ha an initial analysis of the gene expression data set (12,600 probe sets in 327 

leukemia samples; greater than 4 x 10 6 data elements), an unsupervised two- 
dimensional hierarchical clustering algorithm was used to group leukemia samples 
with similar gene expression patterns against clusters of similarly expressed genes. 
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This analysis clearly identified 6 major leukemia subtypes that corresponded to T- 
ALL, hyperdiploid with >50 chromosomes, BCR-ABL, E2A-PBX1, TEL-AML1, and 
MLL gene rearrangement Moreover, within the heterogeneous collection of 
leukemias that were not assigned to one of these subtypes, a novel subgroup of 14 
5 cases was identified that had a distinct gene expression profile. The separation of 
these seven leukemia subgroups was also seen using the multidimensional scaling 
procedure of discriminant analysis with variance (DAV), in which the data are 
reduced into component dimensions consisting of linear combinations of 
discriminating genes. For example, using the three component dimensions that 

10 accounted for 72.8% of the variance of gene expression among the subgroups, it was 
possible to distinguish T-ALL (43 cases), E2A-PBX1 (27 cases), TEL-AML1 (79 
cases) and hyperdiploid >50 (64 cases) from the remaining ALL subtypes (1 14 cases). 
Similarly, using three different components that account for an additional 16.1% of 
the variance in gene expression mad it possible to discriminate cases with BCR-ABL 

15 (15 cases), MLL gene rearrangement (20 cases) and the novel subgroup of ALL (14 
cases). 

Statistical methods were used to identify those genes that best define the 
individual groups. Expression profiles were obtained using the top 40 genes per 
subgroup as selected by a Chi square metric. Distinct groups of genes distinguish 

20 cases defined by E2A-PBX1, MLL, T-ALL, hyperdiploid >50, BCR-ABL, the novel 
subgroup, and TEL-AML1. hi addition to these specific subgroups, 65 cases (20% of 
the total) were identified that did not cluster into any of the leukemia subtypes. The 
expression profiles of these latter cases varied markedly, suggesting that they 
represent a heterogeneous group of leukemias. Nearly identical results were obtained 

25 when the hierarchical clustering was performed with genes selected by other 
statistical metrics. 

For T-ALL, two gene clusters that discriminated this subtype from B-lineage 
cases were identified. One cluster was expressed at high and one cluster was 
expressed at low levels. In contrast the top ranked discriminating genes for each of 
30 the other leukemia subtypes consisted primarily of genes that were overexpressed 
within the specific leukemia subtype. With the exception of T-ALL, the identified 
expression profiles do not represent a specific differentiation stage of the leukemic 
blasts. For example, although E2A-PBX1 is almost exclusively found in ALLs with a 
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pre-B cell immunophenotype (Hunger (1996) Blood 87:121 1-24), the identified 
expression profile was specific for the E2A-PBX1 genetic lesion and not the pre-B 
immunophenotype. 

To confirm that the microarray analysis provided an accurate reflection of 
actual gene expression levels, the microarray data was compared with results for RNA 
levels obtained by real-time RT-PCR (5 genes). In addition, the corresponding 
protein levels were assessed by immunophenotype analysis performed by flow 
cytometry using nine specific cell surface antigens). A very high degree of 
correlation was observed between the levels of RNA expression detected by 
quantitative RT-PCR and microarray analysis. Similarly, in agreement with results 
from immunophenotying, T-lineage restricted RNA expression was observed for 
CD2, CD3, and CD8, whereas B-lineage restricted expression was observed for 
CD 19, and CD22. In addition, the level of CD 10 RNA expression closely correlated 
with protein levels, with high expression detected in TEL-AML1 leukemias, 
intermediate levels in E2A-PBX1 and low to undetectable expression in cases with 
rearrangements of MLL. Thus, microarray analysis provides an accurate reflection of 
expression levels for most genes, and can be used to accurately detect the expression 
of the more common surface antigens used in the diagnostic evaluation of pediatric 
ALL patients. 

The majority of the leukemia subtype specific genes identified through this 
study were not previously known to have a restricted pattern of expression. In 
addition to their use as diagnostic and subclassification markers, these genes provide 
unique insights into the underlying biology of the different leukemia subtypes. For 
example, E2A-PBX1 leukemias were characterized by high expression of the c-Mer 
receptor tyrosine kinase (MERTK), a known transforming gene (Graham et al (1994) 
Cell Growth Differ. 5:647-657); and Georgescu et al (1999) MoL Cell. Biol. 19:1171- 
81), suggesting that C-MER may be involved in the abnormal growth of these cells. 
Similarly, HOXA9 and MEIS1 were exclusively expressed in cases having MLL 
rearrangements, indicating that they may be directly involved in MLL mediated 
alterations in the growth of the leukemic cells. Interestingly, high expression of 
MTG16, a homologue of ETO (Gamou et al (1998) Blood 91 :402S-4037), was found 
in TEL-AML1 cases. Alteration of ETO family members in both t(8;21) acute 
myeloid leukemia (by translocation) (Downing (1999) Br. J. Hematol 106:296-308) 
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and TEL-AML1 (by altered expression) suggests that alteration in the biologic 
function of ETO genes is mechanistically involved in these leukemias. 
Little is known about the underlying molecular pathogenesis of hyperdiploid ALL 
>50 chromosomes, which clinically is distinct from hyperdiploid cases having 47-50 
5 chromosomes. This distinction is supported by the marked differences in gene 

expression profiles between these two subgroups. Although hyperdiploid >50 ALLs 
have an excellent prognosis, the specific genetic lesions responsible for the aberrant 
proliferation in these cases remains poorly understood. Interestingly, almost 70% of 
the genes that define this subgroup are localized to either chromosome X or 21. 

10 Moreover, the class defining genes on chromosome X were overexpressed in the 

hyperdiploid >50 chromosomes ALLs irrespective of whether the leukemic blasts had 
a trisomy of this chromosome (data not shown). Detailed analysis will be required to 
determine the specific signaling pathways that are disrupted as a result of the altered 
expression of these genes. Lastly, the novel subgroup of ALL was defined by high 

1 5 expression of a group of genes, including the receptor phosphatase PTPRM, and 

LHFPL2, a gene that is a part of the LHFP-like gene family, the founding member of 
which was identified as the target of a lipoma-associated chromosomal translocation 
(Petit et al. (1999) Genomics 57:438-41). 

20 Expression Profiling as a Diagnostic Tool 

A major goal of this study was to develop a single platform of expression 
profiling to accurately identify the known, prognostically important leukemia 
subtypes. To this end, computer-assisted learning algorithms were used to develop an 
expression-based leukemia classification. Through a reiterative process of error 

25 minimization, these algorithms learn to recognize the optimal gene expression 

patterns for a leukemia subtype. Classification was approached using a decision tree 
format, in which the first decision was T-ALL versus B-lineage (non-T-ALL), and 
then within the B-lineage subset, cases were sequentially classified into the known 
risk groups characterized by the presence of E2A-PBX1, TEL-AML1, BCR-ABL, 

30 MLL chimeric genes, and lastly hyperdiploid with >50 chromosomes. Cases not 

assigned to one of these classes were left unassigned. Classification was performed 
using a Support Vector Machine (SVM) algorithm with a set of discriminating genes 
selected by a correlation-based feature selection (CFS), or if this method selected 
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greater than 20 genes for a particular class, by using the top 20 ranked genes selected 
by a chi-square metric, or one of the other metrics detailed in the Experimental 
Procedures section. This approach resulted in an accurate class prediction in a 
randomly selected training set that consisted of two-thirds of the total cases (215 
5 cases). When this classification model was then applied to a blind test set consisting 
of the remaining 1 12 samples, an overall accuracy of 96% was achieved for class 
assignment. The number of genes required for optimal class assignment varied 
between classes. A single gene was sufficient to give 100% accuracy for both T-ALL 
and E2A-PBX1, whereas 7-20 genes were required for prediction of the other classes. 

10 Only slight differences were observed in the prediction accuracy of individual classes 
when the process was repeated using genes selected by a number of other metrics, 
including T-statistics, a novel metric referred to as Wilkins', or genes selected by a 
combination of self organizing maps (SOM) and DAV. Moreover, nearly identical 
results were obtained when the various sets of selected genes were used in a number 

1 5 of different supervised learning algorithms, including K-Nearest Neighbor (k-NN), 

Artificial Neural Network (ANN), and prediction by collective likelihood of emerging 
patterns (PCL). 

Four cases initially appeared to be misclassified as TEL-AML1 by gene 
expression analysis since they lacked a detectable chimeric transcript by RT-PCR. 

20 Upon further analysis by FISH, however, one of these cases was shown to have a 
TEL-AML1 fusion, presumably, a variant rearrangement that could not be detected 
with the amplification primers used for the TEL-AML1 RT-PCR assay. In each of 
the three remaining cases, re-examination of the karyotypes revealed translocations 
involving the p arm of chromosome 12. FISH analysis demonstrated that two of these 

25 cases had deletion of one TEL allele, whereas the remaining case had a partial 

deletion of one TEL allele. Thus, the identified expression profiles appear to reflect 
an abnormality of the TEL transcription factor, and may in fact provide a more 
accurate means of identifying a specific leukemia subtype defined by its underlying 
biology. Collectively, these data demonstrate that the single platform of gene 

30 expression profiling can accurately identify the known prognostic subtypes of ALL. 



-23- 



BNSDOCID: <WO 030831 40A2J_> 



WO 03/083140 



PCT/US03/084S6 



Use of Expression Profiles to Identify Patients at High Risk of Treatment Failure 

Relapse and the development of therapy-induced acute myeloid leukemia 
(AML) are the major causes of treatment failure in pediatric ALL. To determine if 
5 expression profiling might further enhance the ability to identify patients who are 
likely to relapse, the expression profiles of the four groups of leukemic samples were 
compared. The groups of samples used for this comparison were: l)diagnostic 
samples of patients that developed hematological relapses (n = 32); (ii) diagnostic 
samples from patients who remained in continuous complete remission (CCR) (n = 

10 201); (iii) diagnostic samples from patients who developed therapy-induced AML (n 
= 16); and (iv) leukemic samples collected at the time of ALL relapse (n = 25). Using 
DAV, distinct gene expression profiles were identified for each of these groups. 

To further assess the predictive power of the different gene expression 
profiles, supervised learning algorithms were used. Because of the overwhelming 

1 5 differences in the expression profiles of the different leukemia subtypes, it was not 
possible to identify a single expression signature that would predict relapse 
irrespective of the genetic subtype. However, within individual leukemic subtypes, 
distinct expression profiles could be defined that predicted relapse. Class assignment 
was performed using a SVM supervised learning algorithm with discriminating genes 

20 selected by CFS, or if this method returned >20 genes, the top 20 genes selected by T- 
statistics. For both the T-lineage and hyperdiploid >50 subgroups, expression profiles 
identified those cases that went on to relapse with an accuracy of 97% and 100%, 
respectively, as assessed by cross validation. Moreover, the predictive accuracy was 
, statistically significant when compared to results from an analysis of 1000 random 

25 permutations of the specific patient data set. Similarly, expression profiles predictive 
of relapse were identified for TEL-AML, MLL, or cases that lacked any of the known 
genetic risk features. Although the predictive accuracy of these latter expression 
profiles was very high as assessed by cross validation, it did not reach statistical 
significance when compared to results from an analysis of 1000 random permutations 

30 of the same patient data set, likely secondary to the limited number of cases. The 
patterns of expression for a combination of genes, rather than expression levels of a 
single gene were found to have the greatest predictive accuracy. Since few known 
risk-stratifying biologic features have been previously identified for either T-ALL or 
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hyperdiploid >50 ALL, the results suggest that the identified expression profiles 
provide independent risk stratifying information. 

A distinct expression profile was identified in the ALL blasts from patients 
who developed therapy-induced AML. Because secondary AML is thought to arise 
5 from a hematopoietic stem cell that is distinct from that giving rise to the primary 
leukemia, it is difficult to understand how the biology of the original ALL blasts 
could predict the risk of developing a therapy-induced complication. However, when 
the accuracy of expression profiling was evaluated in within the TEL- AML 1 
subgroup, a distinct expression signature consisting of 20 genes was defined. This 
1 0 profile identified, with 1 00% accuracy in cross validation, all patients who developed 
secondary AML, with a p value of 0.031 as assessed by comparison to results from an 
analysis of 1000 random permutations of the patient data set. Genes within this 
signature included RSU1, a suppressor of the Ras signaling pathway, and Msh3, a 
mismatch repair enzyme. 

15 

Overview of Experimental Procedures 

A. Tumor Samples 

The diagnosis of ALL was based on the morphologic evaluation of the bone 
marrow and on the pattern of reactivity of the leukemic blasts with a panel of 

20 monoclonal antibodies directed against lineage-associated antigens. A total of 389 

pediatric acute leukemia samples were analyzed in this study, from which high quality 
gene expression data was obtained on 360 (93%). The successfully-analyzed samples 
included 332 diagnostic BM, 3 diagnostic peripheral bloods (PB), and 25 relapsed 
ALL samples from BM or PB. 264 (79%) of the diagnostic ALL BM samples and all 

25 relapse samples were from patients enrolled on St. Jude Children's Research Hospital 
Total Therapy Studies XIHA or XIIIB and corresponded to 64% of the patients 
treated on these protocols. The details of these protocols have been previously 
published (Pui et al. (2000) Leukemia 14:2286-94). The remaining samples were 
obtained from patients treated on St. Jude Total Therapy Studies XI, XII, XTV, XV, or 

30 by best clinical management. All protocols and consent forms were approved by the 
hospital's institutional review board, and informed consent was obtained from 
parents, guardians, or patients (as appropriate). The composition of the data sets used 
for the identification of gene expression profiles predictive of specific genetic 
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subtypes, hematological relapse, and risk of developing secondary AML are described 
below. 

B. Gene Expression Profiling 
5 RNA was extracted from cryopreserved mononuclear cell suspensions from 

diagnostic BM aspirates or PB samples using TRIZOL® (Invitrogen Corp., Carlsbad, 
California) according to the manufacturers instructions, and the RNA integrity was 
assessed by using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, 
CA). cDNA was synthesized using a T-7 linked oligo-dT primer and cRNA was then 

1 0 synthesized with biotinylated UTP and CTP. The labeled RNA was then fragmented 
and hybridized to HG_U95 Av2 oligonucleotide arrays (Affymetrix Incorporated, 
Santa Clara, CA) according to the manufacturer's instructions. 

Arrays were scanned using a laser confocal scanner (Agilent) and the 
expression value for each gene was calculated using AFFYMETRIX® Microarray 

15 Software version 4.0. The average intensity difference (AID) values were normalized 
across the sample set and minimum quality control standards were established for . 
including a sample's hybridization data in the study. 10% of samples were run in 
duplicate to ensure consistency of data acquisition throughout the study. A high level 
of reproducibility was observed between replicate samples, with fewer than 1% of 

20 genes showing a variation in average intensity difference of greater than 2-fold. 

C. Statistical Analysis 

Unsupervised hierarchical clustering, principal component analysis (PCA), 
discriminant analysis with variance (DAV), and self organizing maps (SOM) were 

25 performed using GeneMaths software (version 1.5, Applied Maths, Belgium). Data 
reduction to define the genes most useful in class distinction was perfomied using a 
variety of metrics as detailed below. Genes selected by the various metrics were used 
in supervised learning algorithms to build classifiers that could identify the specific 
genetic or prognostic subgroups. The algorithms used included k-Nearest Neighbors 

30 (k-NN), Support Vector Machine (SVM), prediction by collective likelihood of 

emerging patterns (PCL), an artificial neural network (ANN), and weighted voting. 
Performance of each model was initially assessed by leave-one-out cross validation 
on a randomly selected stratified training set consisting of two-thirds of the total 
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cases. True error rates of the best performing classifiers were then determined using 
the remaining third of the samples as a blinded test group. Details of the individual 
metrics and supervised learning algorithms are described below. 

5 Detailed Experimental Procedures 

A. RNA Extraction, Labeling, Hybridization, and Data analysis 

Mononuclear cell suspensions from diagnostic BM aspirates or peripheral 
blood (PB) samples were prepared from each patient and an aliquot was 
cryopreserved. RNA was extracted using TRIZOL® following the manufacture's 

10 recommended protocol as described above. RNA integrity was assessed by 
electrophoresis on the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). 

First and second strand cDNA were synthesized from 5-15 jo,g of total RNA 
using the Superscript Double-Stranded cDNA Synthesis Kit ((Invitrogen Corp., 
Carlsbad, California) and an oligo-dT 24 -T7 (5>-GGC CAG TGA ATT GTA ATA 

15 CGA CTC ACT ATA GGG AGG CGG-3'; SEQ ID NO:l) primer according to the 
manufacturer's instructions. cRNA was synthesized and labeled with biotinylated 
UTP and CTP by in vitro transcription using the T7 promoter coupled double stranded 
cDNA as template and the T7 RNA Transcript Labeling Kit according the 
manufacturer's instructions (Enzo Diagnostics Inc., Farmingdale NY). Briefly, double 

20 stranded cDNA synthesized from the previous steps was washed twice with 70% 

ethanol and resuspended in 22 jal RNase-free water. The cDNA was incubated with 4 
jil of 10X each reaction buffer, ljxl of biotin labeled ribonucleotides, 2 pi of DTT, lpl 
of RNase inhibitor mix and 2 ^1 20X T7 RNA polymerase for 5 hours at 37°C. The 
labeled cRNA was separated from unincorporated ribonucleotides by passing through 

25 a CHROMA SPIN-100 column (Clontech, Palo Alto, CA) and precipitated at -20°C 
for 1 hr to overnight. 

The cRNA pellet was resuspended in 10 pi Rnase-free H2O and 10.0 jag was 
fragmented by heat and ion-mediated hydrolysis at 95°C for 35 minutes in 200 mM 
Tris-acetate, pH 8.1, 500 mM KOAc, 150 mM MgOAc. The fragmented cRNA was 

30 hybridized for 1 6 hr at 45°C to HGJJ95 Av2 AFFYMETRIX® oligonucleotide arrays 

(Affymetrix, Santa Clara, CA) containing 12,600 probe sets from full-length 

annotated genes together with additional probe sets designed to represent EST 

sequences. Arrays were washed at 25°C with 6X SSPE (0.9M NaCl, 60 mM 
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NaH 2 P0 4? 6 mM EDTA, 0.01% Tween 20) followed by a stringent wash at 50°C with 
100 niM MES, 0.1M NaCl 2 , 0.01% Tween 20. The arrays were then stained with 
phycoerythrin conjugated streptavidin (Molecular Probes, Eugene, OR). 

Arrays were scanned using a laser confocal scanner (Agilent, Palo Alto, CA) 
5 and the expression value for each gene was calculated using AFFYMETRIX® 

Microarray software (MAS 4.0). The signal intensity for each gene was calculated as 
the average intensity difference (AID), represented by [S(PM - MM)/(number of 
probe pairs)], where PM and MM denote perfect-match and mismatch probes, 
respectively. Expression values were normalized across the sample set by scaling the 
10 average of the fluorescent intensities of all genes on an array to a constant target 
intensity of 2500, then any AID over 45,000 was capped to a value of 45,000. All 
AID's less than 100, including negative values and absent calls were converted to a 
value of 1 . In addition, a variation filter was used to eliminate any probe set in which 
fewer than 1% of the samples had a present call, or if the Max AID - Min AID across 
15 the sample set was less than 100. The average intensity differences for each of the 
remaining genes were analyzed. For some metrics the data was log transformed prior 
to analysis. The minimum quality control values required for inclusion of a sample's 
hybridization data in the study were 10% or greater present calls, a GAPDH/Actin 
375' ratio <5, and use of a scaling factor that was within 3 standard deviations from 
20 the mean of the scaling values of all chips analyzed. 

The average percent present calls for theoverall data set was 29.7%, and for 
each of the genetic subgroups was BCR-ABL (31.1%), E2A-PBX1 (28.9%), Hyper 
>50 (31%), MLL (29.8%), T-ALL (29.1%), TEL-AML1 (28.5%), Novel (30.2%), 
others (31.1%). In addition, each sample had >75% blasts. The average percentage 
25 blasts for the overall data set used to define the genetic subtypes was 93%, and for 
each genetic subtype was BCR-ABL (92%), E2A-PBX1 (96%), Hyper >50 (93%), 
MLL (93%), T-ALL (91%), TEL-AML1 (92%), Novel (95%), and others (94%). 

B Reproducibility of Microarray Data 
30 The reproducibility of the AFFYMETRIX® microarray system was assessed 

by comparing the gene expression profiles of RNA extracted from duplicate 
cryopreserved diagnostic leukemic samples from 23 patients with single RNA 
samples from 13 patients analyzed on two separate arrays. The mean number of 
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probe sets that displayed a S2-fold difference in expression between separately 
extracted but paired RNA samples was 144, and for single RNA samples analyzed on 
two separate occasions was 133. Moreover, very few probe sets were found to have a 
23-fold difference in expression levels between replicate samples. The observed 
5 number of probe sets showing a difference in expression values represents less than 
2% of the total number of probe sets on the microarray, and thus these data suggest 
that the AFFYMETRJX® microarray system has a very high degree of 
reproducibility. 

10 C. Comparison of Expression Profiles from PB and BM leukemia samples 
Matched BM and PB samples that contained =S0% leukemic blasts were 
obtained from 10 patients and the RNA was extracted and assessed by microarray 
analysis. A very high level of correlation was observed between the expression 
profiles of BM and PB, with only 189 probe sets having a greater than a 2-fold 

1 5 difference in expression. No genes were found to be consistently over- or under- 
expressed in one sample type. These data demonstrate that there are minimal 
differences in the gene expression profiles of leukemic blasts obtained from BM or 
PB, and that diagnostic gene expression profiling is possible on samples obtained 
from the PB. 

20 

D. RT-PCR Results 

Real-time TAQMAN® RT-PCR assays (Applied Biosystems, Foster City, 
CA) were performed to independently determine the level of mRNA for five genes 
that were found by microarray analysis to be predictive of either T-lineage ALL 

25 (CD35, CD3D antigen delta polypeptide TiT3 complex; MAL, mal T-Cell 

differentiation protein; and PRKCQ, protein kinase C theta) oxE2A-PBXl expressing 
ALL (MERTK, c-Mer proto-oncogene tyrosine kinase and KIAA802). The RNA 
samples analyzed included four samples each of E2A-PBX1 and T-ALL, and two 
samples each from the remaining subtypes (BCR-ABL, MLL, TEL-AML1, 

30 Hyperdiploid >50, Hyperdiploid 47-50, Hypodiploid, Pseudodiploid, and normal). 

Whenever possible, the forward and reverse primers were designed in different exons 
so that DNA contamination would not be a concern. In the case of MAL where this 
was not clear, the RNA was treated for 15 minutes at room temperature with 1 .0 unit 
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of DNase I (Invitrogen Corp., Carlsbad, California) using the Invitrogen protocol to 
remove any contaminating DNA. 

Thirty-three ng of RNA from each sample was reverse transcribed using 
random hexamers and Multiscribe Reverse Transcriptase (Applied Biosystems, Foster 
5 City, CA) in a total volume of 10 \d. Real time PCR was performed on a Applied 
Biosystems PRISM® 7700 Sequence Detection System (Applied Biosystems). All 
probes were labeled at the 5' end with FAM (6-carboxy-fluroescein) and at the 3' end 
with TAMRA (6-carboxy-tetramethyl-rhodamine). 

The PCR reactions were performed in a total volume of 50 |iil containing 10 fal 

10 of the reverse transcriptase product, 300 nM each of the forward and reverse primers, 
100 nM of probe, IX master mix and 1 ^il of AMPLITAQ GOLD® DNA polymerase 
(Applied Biosystems). Following a 10 minute incubation at 95°C to activate the 
polymerase, samples were denatured at 95°C for 15 seconds, then annealed and 
extended at 60°C for 1 minute, for a total of 40 cycles. The RNA from each sample 

15 was also amplified using primers and probes to RNase P (Applied Biosystems) for use 
in normalization according to the manufacturers instructions. Negative controls were 
included in each run. Standard curves were generated for T-cell markers and RNase P 
using MOLT4 RNA, a T-cell leukemia cell line, and for the E2A-PBX1 markers and 
RNase P using a leukemia cell line, 697, that contains an E2A-PBX1 fusion. 

20 The expression level of the predictive genes and RNase P were determined in 

each of the 24 ALL samples. A ratio was then calculated by taking the expression 
value for the specific gene and dividing it by the expression level of RNase P in the 
sample. These ratios were then compared to the values obtained from the 
AFFYMETRIX® chip data from the same RNA sample. The raw AFFYMETRIX® 

25 chip data were scaled as described and then normalized using the 3'GAPDH value for 
each sample, yielding a normalized ratio. The TAQMAN® results and 
AFFMETRIX® chip ratios were then log transformed and compared. Since the 
markers selected for TAQMAN® analysis were predictors for either E2A-PBX1 or T- 
ALLs, each gene was expected to have four RNA samples with high and 20 samples 

30 with low expression. For each gene evaluated, an average expression value for both 
the TAQMAN® results and AFFYMETRIX® data was calculated for all samples in 
the up-regulated group, and similarly, for the samples in the down-regulated group. 
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E. Comparison of Real-time RT-PCR Data and AFFYMETRIX® Chip Data 

The normalized gene expression ratios for the TAQMAN® data (gene/RNase 
P) and for the AFFYMETRTX® microarray data (AID for a gene/AID for GAPDH) 
were log transformed and then the average expression values for each gene was 

5 calculated in the four samples in which its expression was expected to be up-regulated 
and separately in the 20 samples in which its expression was expected to be down- 
regulated. For example, for genes that were expected to be up-regulated in T-ALL 
(CD3S, MAL, and PRKCQ), the log expression ratios in the T-ALL samples were 
averaged to give the up regulated values and the log expression ratios of each gene in 

10 the non-T-ALL cases were averaged to give the down regulated value. 

In both the TAQMAN® and the microchip array analysis, MERTK and 
KIAA802, were very highly expressed in the diagnostic samples containing E2A- 
PBX1, and expressed at low levels in all of the other samples. Likewise, PRKCQ, 
CD35 , and MAL, showed high levels of expression in T cells by both methodologies 

1 5 in comparison with non T-cells. The normalized ratios from the TAQMAN® assay 
were plotted against the normalized ratios from the microchip array for both the up- 
regulated and down-regulated genes. The correlation between TAQMAN® results 
and the microchip array results was 70%, indicating that the same pattern of gene 
expression was seen in both analyses. The MERTK was extremely high in two of the 

20 E2A-PBX1 patient samples by TAQMAN® analysis. Removal of the MERTK gene 
from the analysis resulted in a correlation of 91% between the TAQMAN® results 
and the microchip array results. 

F. Comparison of AFFYMETRIX® Microarray Chip Results and 

25 Immunophenotype Results 

Leukemic blasts at the time of diagnosis were analyzed for expression of 
lineage restricted cell surface antigens using phycoerythrin- or fluorescein 
isothiocyanate-conjugated monoclonal antibodies against CD2, CD3s, CD4, CD5, 
CD7, CDS, CD 10, CD 19, and CD22 (Becton Dickinson Inmiunocytometry Systems, 

30 San Jose, CA, USA). Data were obtained using a COULTER® EPICS XL™ 

(Beckman Coulter, Miami, FL), a COULTER® ELITE™ (Beckman Coulter), or a 
BD FACSCalibur™ flow cytometer (Becton Dickinson, San Jose, CA). The 
expression patterns for these antigens were then compared to gene expression patterns 
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for the AFFYMETRIX® chip sites specified for CD2 (1 probe set, 4073S_at), CD35 
(1 probe set, 38319_at), CD3e{\ probe set, 36277_at), CD3£(l probe set, 37078_at), 
CD3y(\ probe set, 39226_at), CD4 (5 probe sets, 856_at, 1146_at, 35517_at, 
34003_at, and 37942_at), CDS (lprobe set, 32953_at), CD7 (1 probe set, 771_s_at), 
CD8a(\ probe set, 40699_at), CD80{\ probe set, 39239_at), CD10 (1 probe set, 
1389_at), CD19 (2 probe sets, 1096_g_at and 1116_at), and CD22 (2 probe sets, 
38521_at and 38522_s_at). As a control, the performance of the AFFYMETRIX® 
microarray probe sets were also assessed using RNA isolated from flow sorted single 
positive CD4+ and CD8+ thymocytes, and CD10+/CD19+ bone marrow cells. High 
RNA expression was observed in T-ALL for the T-lineage restricted genes CD2, 
CD3S, s, and £ CD8a , and CD7, and in B-lineage ALLs for the B-cell restricted 
genes CD 19, and CD22. A similar high level of correlation was observed between 
RNA and protein expression for CD 10. The observed low expression levels of T-cell 
restricted genes in B-cell cases, and B-cell restricted genes in T-ALLs, is consistent 
with the low level of normal contaminating lymphocytes present in the diagnostic 
marrow samples analyzed. 

G. Patient Data Set 

A total of 389 Pediatric acute leukemia samples were analyzed in this study, 
20 from which high quality gene expression data were obtained on 360 (93%). The 
successfully analyzed samples included: 332 diagnostic bone marrows (BM), 3 
diagnostic peripheral blood samples (PB), and 25 relapse ALL samples from BM or 
PB. 264 (79%) of the diagnostic ALL BM samples and all relapse samples were from 
patients treated on St. Jude Children's Research Hospital Total Therapy Studies XHIA 
25 or XmB and correspond to 64% of the patients treated on these protocols. The details 
of these protocols are described in Pui et al, "Risk-adapted treatment for acute 
lymphoblastic leukemia: findings from St. Jude Children's Research Hospital," 
Haematology and Blood Transfusions, 1997, pp 629-37, Springer-Verlag, Berlin and 
in Pui et al. (2000) Leukemia 14:2286-94. Study XHIA ran from December 20, 1991 
30 to August 23, 1994 and enrolled 165 patients, whereas Study XHEB ran from August 
24, 94 to July 27, 1998 and enrolled 247 patients. No patients were lost to follow-up 
during treatment. When the databases were frozen for analysis, 100% and 93% of 

event-free survivors in studies XIHA and XIHB, respectively, had been seen within 12 
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months. The median (minimum, maximum) follow-up of the event-free survivors 
was 8.09 (6.59, 9.94) and 4.52 (2.37, 7.06) years for XIIIA and XIIIB, respectively. 
All other samples were obtained from patients treated on St. Jude Total Therapy 
Studies XI, XII, XIV, XV, or by best clinical management. 
5 For the identification of gene expression profiles that predict specific genetic 

subtypes of ALL, 327 diagnostic BM samples were used. The criteria for inclusion in 
this data set were the availability of a cryopreserved diagnostic BM sample containing 
^5% blasts, and complete data from each of the following diagnostic studies: 
morphology, immunophenotype, cytogenetics, DNA ploidy, Southern blot for MLL 

1 0 gene rearrangements, and RT-PCR analysis for MLL-AF4, MLL-AF9, E2A-PBX1 , 
TEL-AML1, and BCR-ABL. This final data set includes diagnostic BM samples 
from XV (38), XIV (4), XIIIA (100), XIIIB (161), or from patients treated on one of 
our older protocols or by best clinical management (24). 

The data sets used to identify expression profiles predicative of hematologic 

15 relapse and the development of therapy-induced AML are described in Table 1 . 

Table 1: Patient Database 



Diagnostic samples used for subtype classification (n=327) 







BCR-ABL 


subgroup (n=15) 






Label® 


Protocol* 


Outcome % 


Label® 


Protocol Outcome 


BCR-ABL-C1 


T13B 


CCR 


BCR-ABL-#4 


Til 


NA 


BCR-ABL-R1 


T13A 


Heme Relapse 


BCR-ABL-#5 


T12 


NA 


BCR-ABL-R2 


T13A 


Heme Relapse 


BCR-ABL-#6 


T12 


NA 


BCR-ABL-R3 


T13B 


Heme Relapse 


BCR-ABL-#7 


T12 


NA 


BCR-ABL- 












Hyperdip-R5 


T13B 


Heme Relapse 


BCR-ABL-#8 


T14 


NA 


BCR-ABL-#1 


T13A 


Censored 


BCR-ABL-#9 


T15 


NA 


BCR-ABL-#2 


T13B 


Censored 


BCR-ABL-Hyperdip-#1 0 


T12 


NA 


BCR-ABL-#3 


T13B 


Censored 












E2A-PBX1 


1 subgroup <n=27) 






E2A-PBX1-C1 


T13A 


CCR 


E2A-PBX1-#1 


Others 


NA 


E2A-PBX1-C2 


T13A 


CCR 


E2A-PBXl-#2 


Others 


NA 


E2A-PBX1-C3 


T13A 


CCR 


E2A-PBXl-#3 


Others 


NA 


E2A-PBX1-C4 


T13A 


CCR 


E2A-PBXl-#4 


Others 


NA 


E2A-PBX1-C5 


T13A 


CCR 


E2A-PBXl-#5 


Others 


NA 


E2A-PBX1-C6 


T13B 


CCR 


E2A-PBXl-#6 


Others 


NA 


E2A-PBX1-C7 


T13B 


CCR 


E2A-PBXl-#7 


Til 


NA 


E2A-PBX1-C8 


T13B 


CCR 


E2A-PBXl-#8 


Til 


NA 


E2A-PBX1-C9 


T13B 


CCR 


E2A-PBXl-#9 


T12 


NA 


E2A-PBX1-C10 


T13B 


CCR 


E2A-PBX1-#10 


T12 


NA 


E2A-PBX1-C11 


T13B 


CCR 


E2A-PBX1-#11 


T14 


NA 


E2A-PBX1-C12 


T13B 


CCR 


E2A-PBX1-#12 


T15 


NA 
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E2A-PBX1-R1 
E2A-PBX1-2M#1 



T13B Heme Relapse 
T13B 2nd AML 



E2A-PBX1-#13 



T15 



NA 



Hvperdip>50 subgroup (n=64) 



Hyperdip>50-Cl 


T13A 


CCR 


Hvr>erdiD>50-C2 


T13A 


CCR 


Hyperdip>50-C3 


T13A 


CCR 


xiyperaip-^j u-u>f 


T1 1 A 

1 1JA 


CCR 


Hyperdip>50-C5 


T13A 


CCR 


Hyperdip>50-C6 


T13A 


CCR 


Hyperdip>50-C7 


T13A 


CCR 


Hyperdip>50-C8 


T13A 


CCR 


Hyperdip>50-C9 


T13A 


CCR 


Hyperdip>50-C10 


T13A 


CCR 


Hyperdip>50-Cll 


T13A 


CCR 


Hyperaip>3U-C 1 1 


ti a a 
1 IdJ\ 


PPT? 


HvnerdiD>50-C 1 ^ 


T13A 


CCR 


Hyperdip>50-C14 


T13A 


CCR 


Hyperdip>50-C15 


T13B 


CCR 


Hyperdip>50-C16 


T13B 


CCR 


Hyperdip>50-C17 


T13B 


CCR 


Hyperdip>50-C18 


T13B 


CCR 


Hyperdip>50-C19 


T13B 


CCR 


Hyperdip>50-C20 


T13B 


CCR 


Hyperdip>50-C21 


T13B 


CCR 


Hyperdip>50-C22 


T13B 


CCR 


Hyperdip>50-C23 


T13B 


CCR 


Hyperdip>50-C24 


T13B 


CCR 


Hyperdip>50-C25 


T13B 


CCR 


Hyperdip>50-C26 


T13B 


CCR 


Hyperdip>50- 






C27-N 


T13B 


CCR 


Hyperdip>50-C28 


T13B 


CCR 


Hyperdip>50-C29 


T13B 


CCR 


riyperaip-^o u-v^ d \j 


JL A -J JO 


CCR 


riypei aip^jv-^j i 


1 1 JD 




Hyperdip>50-C32 


T13B 


CCR 


Hyperdip47-50- 






Cl 


T13A 


CCR 


Hyperdip47-50- 






C2 


T13A 


CCR 


Hyperdip47-50- 






C3-N 


T13A 


CCR 


Hyperdip47-50- 






C4 


T13A 


CCR 


Hyperdip47-50- 






C5 


T13A 


CCR 



0 y pci sALyj^ j \j \~^~> ~> 


T13B 


CCR 


Myperaip-^Z) u-v-o^ 


1 x~>XJ 


CCR 


Hyperdip>50-C35 


T13B 


CCK 


Hyperdip>50-C36 


T13B 


CCR 


Hyperdip>50-C37 


T13B 


CCR 


Hyperdip>50-C38 


T13B 


CCR 


Hyperdip>50-C39 


T13B 


CCR 


Hyperdip>50-C40 


T13B 


CCR 


Hyperdip>50-C41 


T13B 


CCR 


Hyperdip>50-C42 


T13B 


CCR 


xiypeiaip-^z) u-^hj 


T13B 


CCR 






Heme 


Hyperdip>50-Rl 


T13A 


Relapse 




Heme 


Hyperdip>50-R2 


T13A 


Relapse 




Heme 


Hyperdip>50-R3 


T13A 


Relapse 




Heme 


Hyperdip>50-R4 


T13B 


Relapse 
Heme 


riyperaip-^D u-ixj 


A 1 JJJ 


Relanse 


Hyperdip>50-2M#1 


T13A 


2nd AML 


Hyperdip>50-2M#2 


T13B 


2nd AML 


Hyperdip>50-#1 


T13A 


Censored 


Hyperdip>50-#2 


T13B 


Censored 


Hyperdip>50-#3 


Others 


"NT A 
JNA 


Hyperdip>50-#4 


Others 


XT A 

JNA 


Hyperdip>50-#5 


T12 


XT A 

NA 


Hyperdip>50-#6 


1 Id 


XT A 


Hyperdip>50-#7 


T15 


NA 


Hyperdip>50-#8 


T15 


NA 


Hyperdip>50-#9 


T15 


NA 


Hyperdip>50-#10 


T15 


NA 


Hyperdip>50-#11 


T15 


NA 


Hyperdip>50-#12 


T15 


NA 


Hyperdip>50-#13 


T15 


NA 


Hyperdip>50-#14 


T15 


NA 



Hvperdip47-50 subgroup (n=23^ 



Hyperdip47-50-C13 
Hyperdip47-50-C14-N 
Hyperdip47-50-C15 
Hyperdip47-50-C16 
Hyperdip47-50-C17 
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T13B 
T13B 
T13B 
T13B 
T13B 



CCR 
CCR 
CCR 
CCR 
CCR 
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Hyperdip47-50- 

C6 T13B 

Hyperdip47-50- 

C7 T13B 

Hyperdip47-50- 

CS T13B 

Hyperdip47-50- 

C9 T13B 

Hyperdip47-50- 

C10 T13B 

Hyperdip47-50- 

Cll T13B 

Hyperdip47-50- 

C12 T13B 

Hypodip-Cl T13A 

Hypodip-C2 T13A 

Hypodip-C3 T13B 

Hypodip-C4 T13B 

Hypodip-C5 T13B 

MLL-C1 T13A 

MLL-C2 T13B 

MLL-C3 T13B 

MLL-C4 T13B 

MLL-C5 T13B 

MLL-C6 T13B 

MLL-R1 T13A 

MLL-R2 T13A 

MLL-R3 T13B 

MLL-R4 T13B 

Normal-Cl-N T13A 

Normal-C2-N T13A 

Normal-C3-N T13A 

Normal-C4-N T13B 

Normal-C5 T13B 

Normal-C6 T13B 

Normal-C7-N T13B 

Normal-C8 T13B 

Normal-C9 T13B 

Pseudodip-Cl T13A 

Pseudodip-C2-N T13A 

Pseudodip-C3 T13A 

Pseudodip-C4 T13A 

Pseudodip-C5 T13A 



CCR 




Hyperdip47-50-C 1 o 




CCR 


CCR 




Hyperaip4 /ou-C l y 




CCR 


CCR 




Hyperaip4 /oU-zMffi 


T1 ^ A 


">nd AML 


CCR 




xiyperaip^ /ou-ffi 


T15 


NA 


CCR 




nyperuip'r / -ju-n-z. 


T15 


NA 


CCR 




Hyperdip47-50-#3 


T15 


NA 


CCR 












Hvnodin subgroup <n=9) 






CCR 




Hypodip-C6 


T13B 


CCR 


CCR 




Hypodip-2M#l 


T13A 


2nd AML 


CCR 




Hypodip-#l 


T15 


XT A 

ISA 


CCR 




Hypodip-#2 


T15 


XT A 

In A 


CCR 












MLL suberouo <n=20) 






CCR 




MLL-2M#1 


1 13A 


J AAifT 

znu /vlvll 


CCR 




MLL-2M#2 


1 13A 


zna /viviiv 


CCR 




MLL-#1 


T1 ^T* 

1 IjD 


^p»n c r\rf*r\ 


CCR 




MLL-#2 


TI IT} 


\^eiis»oreu. 


CCR 




MLL-#3 


utners 


XT A 
J/N /A. 


CCR 




MLL-#4 


Others 


NA 


Heme 


Relapse 


MLL-#5 


Others 


NA 


Heine 


Relapse 


MLL-#6 


T12 


NA 


Heme 


Relapse 


MLL-#7 


T14 


XT A 

In A 


Heme 


Relapse 


MLL-#8 


T»1 A 

T14 


XT A 

JNA 




Normal subgroup <n=18) 






CCR 




Normal-CIO 


T13B 


CCR 


CCR 




Normal-Cl 1-N 


T13B 


CCR 


CCR 




Normal-C12 


T13B 


CCR 










Heme 


CCR 




Normal-Rl 


T13A 


Relapse 










Heme 


CCR 




Normal-R2-N 


T13B 


Relapse 










Heme 


CCR 




Normal-R3 


T13B 


Relapse 


CCR 




Normal-#l 


T13A 


Censored 


CCR 




Normal-#2 


1 1313 


v^ensoreu 


CCR 




j Normal-#3 


1 13B 


Censored 




Pseudodip subgroup <n=29) 






CCR 




Pseudodip-Cl 6-N 


T13B 


CCR 


CCR 




Pseudodip-Cl 7 


T13B 


CCR 


CCR 




Pseudodip-Cl 8 


T13B 


CCR 


CCR 




Pseudodip-Cl 9 


T13B 


CCR 








Heme 


CCR 




Pseudodip-Rl-N 


T13A 


Relapse 
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Pseudodrp-Co 


*~ri i A 




Pseudodip-#l 


T13B 


Pseudodip-L/7 


T1 CI A 




P <;eudodio-#2 


T13B 


Pseudodip-Cb 






P seudo dit)-#3 


Others 


i^seuaoaip-^y 


T1 ^ A 


CCR 


Pseudodip-#4 


Others 


Pseudodip-CIO 


T13B 


CCR 


Pseudodip-#5 


T15 


Pseudodip-Cll 


T13B 


CCR 


Pseudodip-#6 


T15 


Pseudodip-C12 


T13B 


CCR 


Pseudodip-#7 




Pseudodip-C13 


T13B 


CCR 


Pseudodip-#8-N 


T15 


Pseudodip-C14 


1 13t> 


CCR 


Pseudodip-#9 


T15 


Pseudodip-C15 


1 13r> 


CCR 










T-ALL subgroup <n=43) 




T-ALL-C1 


T13A 


CCR 


T-ALL-C23 


T13B 


T-ALL-C2 


T13A 


CCR 


T-ALL-C24 


T13B 


T» ATT 

T-ALL-C3 


T1 Q A 


CCR 


T-ALL-C25 


T13B 


T-ALL-C4 


T13A 


CCR 


T-ALL-C26 


T13B 


T-ALL-C5 


T13A 


CCR 


T-ALL-R1 


T13A 


T-ALL-C6 


T13A 


CCR 


T" ATT T> O 

T-ALL-R2 


T13B 


T-ALL-C7 


T13A 




T-ALL-R3 


T13B 


T-ALL-C8 


T1 1 A 


CCR 


T-ALL-R4 


T13B 


T* ATT 

T-ALL-C9 


Ti in 
1 13d 


CCR 


T-ALL-R5 


T13B 


T-ALL-C10 


T13B 


CCR 


T-ALL-R6 


T13B 


T-ALL-C11 


T13B 


CCR 


T-ALL-2M#1 


T13B 


T-ALL-C12 


T13B 


CCR 


T-ALL-#1 




rp ATT /"** 1 ^ 

T-ALL-C13 


T1 I'D 


CCR 


T-ALL-#2 


T13B 


T» ATT /"^« 1 /I 

T-ALL-C14 


1 lirJ 


CCR 


T-ALL-#4 


T13B 


rr-> ATT 1 C 

T-ALL-C15 


T1 IT* 


CCR 


T-ALL-#5 


T13B 


T ATT A 


T1 


CCR 


T-ALL-#6 


T15 


<t> ATT /~M "7 


1 1jJ3 


CCR 


T-ALL-#7 


T15 


T-ALL-C18 


T13B 


CCR 


T-ALL-#8 


T15 


T-ALL-C19 


T13B 


CCR 


T-ALL-#9 


T15 


T-ALL-C20 


T13B 


CCR 


T-ALL-#10 


T15 


T-ALL-C21 


T13B 


CCR 


T-ALL-#11 


T15 


T-ALL-C22 


T13B 


CCR 







Other 
Relapse 
Censored 

NA 

NA 

NA 

NA 

NA 

NA 

NA 



CCR 
CCR 
CCR 
CCR 
Heme 
Relapse 
Heme 
Relapse 
Heme 
Relapse 
Heme 
Relapse 
Heme 
Relapse 
Heme 
Relapse 
2nd AML 
Other 
Relapse 
Other 
Relapse 
Censored 
Censored 
NA 
NA 
NA 
NA 
NA 
NA 



TEL-AML1 subgroup (n=79) 



TEL-AML1-C1 


T13A 


CCR 


TEL-AML1-C41 


T13B 


CCR 


TEL-AML1-C2 


T13A 


CCR 


TEL- AML 1 -C42 


T13B 


CCR 


TEL-AML1-C3 


T13A 


CCR 


TEL-AML1-C43 


T13B 


CCR 


TEL-AML1-C4 


T13A 


CCR 


TEL-AML1 -C44 


T13B 


CCR 


TEL-AML1-C5 


T13A 


CCR 


TEL- AML 1 -C45 


T13B 


CCR 


TEL-AML1-C6 


T13A 


CCR 


TEL- AML 1 -C46 


T13B 


CCR 


TEL-AML1-C7 


T13A 


CCR 


TEL- AML 1 -C47 


T13B 


CCR 


TEL-AML1-C8 


T13A 


CCR 


TEL-AML1 -C48 


T13B 


CCR 


TEL-AML1-C9 


T13A 


CCR 


TEL-AML1 -C49 


T13B 


CCR 


TEL-AML1-C10 


T13A 


CCR 


TEL-AML1-C50 


T13B • 


CCR 
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TEL-AMLl-Cll 


ri3A 




TFT -AML1-C51 


T13B 


CCR 


TEL-AMLl -CI 2 


T13A 




TFT -AML1-C52 


T13B 


CCR 


TEL-AMLl -CI 3 


T13A 


CCR 


TEL-AML1-C53 


T13B 


CCR 


TEL-AML1-C14 


T13A 


CCR 


TEL- AML 1 -C54 


T13B 


CCR 


TUT A A/17 1 CI ^ 




CCR 


TEL-AML1-C55 


T13B 


CCR 


TEL-AML1-C16 


T13A 


CCR 


TEL-AML1-C56 


T13B 


CCR 


TEL-AMLl -CI 7 


T13A 




TFT -ATV/TT 1 


T13B 


CCR 
Heme 


TEL-AMLl -CI 8 


T13A 


CCR 


TEL-AML1-R1 


T13A 


Relapse 
Heme 


TEL-AML1-C19 


T13A 


CCR 


TEL-AML1-R2 




iveiapse 
Heme 


TEL- AML 1 -C20 


T13A 




TFT - A A/IT 1 -R^ 


T13B' 


Relapse 


TEL-AMLl -C21 


T13A 


CCR 


TEL- AML 1 -2M# 1 


T13A 


2nd AML 


TEL-AMLl -C22 


T13A 


CCR 


TEL- AML 1 -2M#2 


T13A 


2nd AML 


TEL-AML1-C23 


T13A 


CCR 


TEL- AML 1 -2M#3 


1 I JA 


OrtA AA/fT 

zno Aivix-f 


TEL- AML 1 -C24 


T13A 


CCR 


TEL-AMLl -2M#4 


TT1 I'D 


Or\A AAyfT 
ZDQ f\i\xx-i 


TEL-AML1-C25 


T13A 


CCR 


TEL-AML1-2M#5 


T1 I'D 

1 1 Jr> 


In A A A/IT 
ZnQ AIVIJU 

Other 


TEL-AMLl -C26 


T13A 




TFT -AMI 


T13B 


Relapse 


TEL-AMLl -C27 


T13A 


CtK 


TFT A1VTT l-^O 


T13A 


Censored 


TEL-AML1-C2S 


T13A 




TFT -AMLl-#3 


T13A 


Censored 


TEL-AML1-C29 


T13B 


LLK 


TFT - AMI 1 -#4 
1 x3X^--rvivxx_»A rr*-r 


T13B 


Censored 


TEL- AML 1 - C3 0 


1 13B 


/""•/"I'D 


TFT - AMI l-#5 


T15 


NA 


TEL-AMLl -C31 


T13B 


CL-K 


TFT -AMI 1-#6 

X L->J -— IVI 1^ 1 TTVJ 


T15 


NA 


TEL-AML1-C32 


T13B 


CCK 


TFT - AMI l-#7 


T15 


NA 


TEL-AML1-C33 


T13B 


CCR 


TFT AlV/fT 


T15 


NA 


TEL-AML1-C34 


T13B 


CCK 


TFT AA/fT 1 -#Q 
i r>i-»-^AJVxxv i try 


T15 


NA 


TEL- AML 1 -C3 J 


T1 "5X3 




TEL-AML1-#10 


T15 


NA 


TEL- AML 1 - C3 6 


T13B 


CCR 


TP7T A AvTT 1 
1 xz,X->~2\x\XX-i 1 -rr X l 


T15 


NA 


TEL-AMLl -C37 


T13B 


CCR 


TEL-AMLl -#12 


T15 


NA 


TEL-AML1-C3 8 


T13B 


CCR 


TEL-AML1-#13 


T15 


NA 


TEL-AML1-C39 


T13B 


CCR 


TEL-AMLl -#14 


T15 


NA 


TEL-AML1-C40 


T13B 


CCR 









®Label key- 
Subtype Name-C# Dx Sample of patient in CCR 

Subtype Name-R# Dx Sample of patient who developed a hematologic 

5 relapse 

Subtype Name-# Dx Sample used for subgroup classification only 

Subtype Name-2M# Dx Sample of patient who later developed 2 n AML 

Subtype Name-N Dx Sample in novel group 

10 # Protocol- Protocol that patient was treated on 



15 



% Outcome- 

CCR 

Heme Relapse 
Other Relapse 
2nd AML 



Continuous complete remission 
Hematologic relapse 

Extramedullary relapse ^ 
Diagnostic samples of patients who later developed 2 n 
AML 

Censored Censored due to BM transplant, treated off protocol, or died in CR 
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NA Not applicable, primarily because the patient was not treated on 

Total 13, and thus is excluded from the analysis used to identify gene expression 
profiles predictive of outcome 



H. Diagnostic Samples Used for Prediction of Prognosis 

In addition to the 201 CCR and 27 Heme Relapse cases listed in Table 1, five 
additional relapse cases were also included in the prognostic analysis, giving a total of 
233 cases for this analysis. These additional cases were not included in the subgroup 
10 prediction data set because they did not meet the established criteria for the reasons 
listed below. 

Label Protocol Comment 

BCR-ABL-R4 T13B Did not meet QC criteria because 
contained 70% blasts 

1 5 MLL-R5 T 1 3 A Peripheral Blood Sample (90% blasts) 

Normal-R4 T 1 3B Molecular studies not performed 

T-ALL-R7 T 1 3 A Peripheral Blood Sample (90% blasts) 

T-ALL-R8 Tl 3B Peripheral Blood Sample (90% blasts) 



20 I. Diagnostic Samples used for prediction of Secondary AML 

In addition to the 201 CCR and 13 secondary AML cases listed in Table 1, 
three additional diagnostic marrow samples from patients who developed secondary 
AML were also included in the prognostic analysis. This gives a total of 217 cases 
used for this analysis. These additional cases were not included in the diagnostic data 

25 set because they did not meet the established criteria for the reasons listed below. 

Label Protocol Comment 

Hyperdip>50-2M#3 T12 Non Total 1 3 diagnostic sample 

Hypodip-2M#2 Tl 3B No molecular studies performed 

Hypodip-2M#3 T12 Non Total 13 diagnostic sample 

30 

Relapsed Samples (n=25) 

Twenty-five relapse samples were analyzed, 17 samples which were paired to 
the diagnostic samples listed above (Subtype Name-2M#), and 8 additional non- 
paired relapse samples. 

35 
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Detailed Analysis 

A. Hierarchical cluster analysis of diagnostic cases using all genes that passed the 
variation filter 

5 Two-dimensional hierarchical clustering was performed using Pearson 

correlation coefficient and an unweighted pair group method using arithmetic 
averages (GeneMaths, version 1.5). The results of hierarchical clustering of the 327 
diagnostic samples using the 10,991 probe sets that passed the variation filter can be 
viewed at our web site, www.stjuderesearch.org/ALLl. 

10 

B. Methods for gene selection 

Discriminating genes for the various leukemia subtypes were selected using a 
variety of statistical metrics. The individual metrics used and the list of selected probe 
sets and corresponding genes are given below. 

15 

1. Chi-Square 

The Chi square method evaluates each gene individually by measuring the Chi 
square statistics with respect to the classes. The method first discretizes the observed 
expression values of the gene into several intervals using an entropy-based 

20 discretization method 1 . The Chi square statistics of a gene is then calculated as 

X 2 = SE(Aij - Eij) 2 /Eij, summing over intervals i = l..m and classes j = 1 ..k. Ay is the 
number of samples in the i th interval that are of the j th class. Ey is the expected 
frequency of Ay and is calculated as Eg = R\ * Q/N, where R { is the number of 
samples in the i th interval, Cj is the number of samples in the j th class, and N is the 

25 total number of samples. The genes are then sorted according to their Chi square 
statistics: the larger the Chi square statistics, the more important the gene. The 40 
genes with the highest Chi square statistics in each subtype are listed in Tables 2-8. 
Generally, using anywhere from the top 20 to 40 genes did not result in significant 
differences in subtype prediction accuracy. Therefore, only the top 20 genes in 

30 subtype prediction were used, unless noted otherwise. 
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Table 2. Genes selected by Chi square: BCR-ABL 



/viiyriiciriA. 
number 


Gene Name 


GeneSymbol 


Reference 
number 


Chi 
square 
value 


Above/ 
Below 
Mean 


1 1637_at 


mitogen-activated protein kinase- 


MAPKAPK3 


U09578 


62.75 


Above 




activated protein kinase 3 










9 o/C/CCA at 
Z jOOju al 


r*'\/r , li'n T*)9 


CCND2 


D13639 


59.79 


Above 


3 40196_at 


HYA22 pro tern 




D88153 


54.79 


Above 




nrnto-oncopene tvrosine-r)rotein 


ABL 


U07563 


54.77 


Above 




kinase ABL gene 














CASPS 


X98176 


49.70 


Above 




cysteine protease 






48.29 


Above 


6 1636_g_at 


proto-oncogene tyrosine-protein 


ABL 


U07563 




kinase ABL gene 










7 41295_at 


GTT1 protein 


GTT1 


AL041780 


42.60 


Above 


8 37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


42.60 


Above 


9 37012 at 


capping protein actin filament 


CAPZB 


U03271 


38.46 


Above 




muscle Z-line beta 










1 n iqt^ at 


allrvlcrlvrprfvnp T>Vir»<5'n1lfltfi SVTlthclSe 


AGPS 


Y09443 


38.46 


Above 


11 1326_at 


caspase 10 apoptosis-related 


CASP10 


U60519 


37.83 


Above 




cysteine protease 










12 34362_at 


solute carrier family 2 facilitated 


SLC2A5 


M55531 


37.54 


Above 




glucose transporter member 5 








A 1_ 

Above 


13 33150_at 


disrupter of silencing 10 


SAS10 


All 26004 


3o.y_> 


14 4005 l_at 


TRAM-like protein 


KIAA0057 


D31762 


36.95 


Above 


1 ^ ^0061 fit 


bone marrow stromal cell antigen 


BST2 


D28137 


36.95 


Above 


16 33172_at 


2 

hypothetical protein FLJ10849 


FLJ 10849 


T75292 


36.95 


Above 


1 7 ^7^QQ at 


aldo-keto reductase familv 1 


AKR1C3 


D 17793 


36.95 


Above 




member C3 3-alpha 












hydroxysteroid dehydrogenase 












type II 








Above 


1 o 31 /_at 


protease cysieine i icguiiidm 


PRSC1 


D55696 


36.95 


1Q 4f>Q^ at 


1/dXLJUXXXXX -J auiuiv 


CNN3 


S80562 


33.94 


Above 


?0 Q fit 


tiiVinlin filrVha 1 isoform 44 


TUBA1 


HG2259- 


33.32 


Above 






HT2348 






21 40504_at 


paraoxonase 2 


PON2 


ATUUlOUl 


^1 Afs 


AOUVC 


zz 3oj/o_at 


rumor necrosib lauiui iowcpiui 


TMFRSF7 


M63928 


30.47 


Above 




superfamily member 7 










23 39044_s_at 


diacylglycerol kinase delta 130kD 


DGKD 


D73409 


29.59 


Below 


24 36634_at 


BTG family member 2 


BTG2 


U72649 


29.16 


Below 


25 38119_at 


glycophorin C Gerbich blood 


GYPC 


X12496 


29.16 


Above 


26 32562_at 


group 

endoglin Osler-Rendu- Weber 


ENG 


X72012 


27.96 


Above 




syndrome 1 






27.70 


Below 


27 33228_g_at 


interleukin 10 receptor beta 


IL10RB 


AI984234 


28 37006_at 


step II splicing factor SLU7 


SLU7 


AI660656 


27.15 


Above 
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29 38641_at Homo sapiens mRNA for TSC-22- 

like protein 

30 38220_at dihydropyriniidine dehydrogenase DP YD 

31 1211_s_at CASP2 and RIPK1 domain CRADD 

containing adaptor with deatli 
domain 

32 39730_at v-abl Abelson murine leukemia ABL1 

viral oncogene homolog 1 

33 36591_at tubulin alpha 1 testis specific TUBA1 

34 36035_at anchor attachment protein 1 Gaalp GPAA1 

yeast homolog 

35 9S0_at Niemann-Pick disease type CI NPC1 

36 671_at secreted protein acidic cysteine- SPARC 

rich osteonectin 

37 40698_at C-type calcium dependent CLECSF2 

carbohydrate-recognition domain 
lectin superfamily member 2 
activation-induced 

38 39330_s_at actinin alpha 1 ACTN1 

39 1983_at cyclin D2 CCND2 

40 2001_g at ataxia telangiectasia mutated ATM 



AJ133115 

U20938 
U8438S 

X16416 



X96719 



M95178 
X68452 
U26455 



27.15 Above 

27.15 Above 

26.46 Above 

25.90 Above 



X06956 25.90 Above 

AB002135 25.34 Above 

AF002020 25.29 Above 

J03040 25.29 Above 



23.80 Above 



23.70 Above 
23.70 Above 
22.60 Above 



Affymetrix 
number 

1 41146_at 

2 1287_at 

3 32063_at 

4 33355 at 



Table 3: Genes selected by Chi Square for E2A-PBX1 
Gene Name GeneSymbol Reference 

number 



5 430_at 

6 40454_at 

7 753_at 

8 33821_at 

9 39614_at 

10 38340_at 

11 1786_at 

12 39929 at 



ADP-ribosyltransferase NAD poly ADPRT 
ADP-ribose polymerase 

ADP-ribosyltransferase NAD poly ADPRT 
ADP-ribose polymerase 



pre-B-cell leukemia transcription PBX1 
factor 1 

Homo sapiens cDNA FLJ12900 PBX1 
fis clone NT2RP2004321 (by 
CELERA serach of target 
sequence = PBX1) 

nucleoside phosphorylase NP 

FAT tumor suppressor Drosophila FAT 
homolog 

nidogen 2 NID2 

Human DNA sequence from clone HELO 1 
RP3-483K16 on chromosome 
6pl2.1-21.1 

KIAA0802 protein KIAA0802 

huntingtin interacting protein- 1- KIAA0655 
related 

c-mer proto-oncogene tyrosine MERTK 
kinase 

KIAA0922 protein KIAA0922 
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J03473 

J03473 

M86546 
AL049381 



X00737 
X87241 

D86425 
AL034374 

AB018345 
AB014555 

U08023 

AB023139 



Chi 
square 
value 

187.00 



187.00 
187.00 



187.00 
176.11 

164.28 
155.00 

153.46 
143.85 

142.34 

139.97 



Above/ 
Below 
Mean 

Above 



187.00 Above 



Above 
Above 



Above 
Above 

Above 
Above 

Above 
Above 

Above 

Above 
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13 39379_at 

14 717_at 

15 362_at 

16 33513_at 

17 37225_at 

18 854_at 

19 35974_at 

20 36452_at 

21 4064S_at 

22 3S393_at 

23 38994_at 

24 34861_at 

25 38748_at 

26 40113_at 

27 36179_at 

28 37493_at 

29 578_at 

30 41017_at 

31 37625_at 

32 38679_g_at 

33 1389_at 

34 34783_s_at 

35 36959_at 

36 39S64_at 

37 41S62_at 

38 41425_at 

39 37177_at 

40 37485 at 



Homo sapiens mKNA cDNA 
DKFZp586C1019 from clone 
DKFZp586C1019 

GS3955 protein 

protein kinase C zeta 
signaling lymphocytic activation 
molecule 

KIAA0172 protein 

B lymphoid tyrosine kinase 

lymphoid-restricted membrane 
protein 
synaptopodin 

c-mer proto-oncogene tyrosine 
kinase 

KIAA0247 gene product 

STAT induced STAT inhibitor-2 

golgi autoantigen golgin subfamily 
a3 

adenosine deaminase RNA- 
specific Bl homolog of rat RED1 
GS3955 protein 

mitogen-activated protein kinase- 
activated protein kinase 2 

colony stimulating factor 2 
receptor beta low-affinity 
granulocyte-macrophage 
Human recombination acitivating 
protein (RAG2) gene 
myosin-binding protein H 
interferon regulatory factor 4 
small nuclear ribonucleoprotein 
polypeptide E 

membrane metallo-endopeptidase MME 
neutral endopeptidase 
enkephalinase CALLA CD 10 
BUB3 budding uninhibited by 
benzimidazoles 3 yeast homolog 
ubiquitin-conjugating enzyme E2 
variant 1 

cold inducible RNA-binding 
protein 

KIAA0056 protein 
Friend leukemia virus integration FLI1 
1 

CD58 antigen lymphocyte CD58 
function-associated antigen 3 
fatty-acid-Coenzyme A ligase very FACVL1 
long-chain 1 





AL049397 


139.49 


Above 


GS3955 


D87119 


135.24 


Above 


PRKCZ 


Z15108 


131.36 


Above 


SLAM 


U33017 


131.36 


Above 


KIAA0172 


D79994 


131.36 


Above 


BLK 


S76617 


130.95 


Above 


LRMP 


U10485 


123.33 


Above 


KIAA1029 


AB028952 


123.33 


Above 


MERTK 


U08023 


120.51 


Above 


KIAA0247 


D87434 


120.51 


Above 


STATI2 


AF037989 


11S.5S 


Below 






1 IV). ov/ 


Above 


AD ARB 1 


U76421 


11/1 1 *3 

1 14.13 


Above 


GS3955 


D87119 


114.13 


Above 


MAPKAPK2 


U12779 


113.43 


Above 


CSF2RB 


H04668 


113.04 


Above 


RAG2 


M94633 


111.32 


Above 



MYBPH 

IRF4 

SNRPE 



BUB3 
UBE2V1 
CIRBP 
KIAA0056 



U27266 
U52682 
AA733050 

J03779 



109.73 Above 

108.51 Above 

106.02 Above 

105.65 Below 



AF047473 


103.87 


Above 


U49278 


103.87 


Above 


D78134 


99.76 


Below 


D29954 


99.76 


Above 


M98833 


96.47 


Above 


Y00636 


93.84 


Above 


D8830S 


93.17 


Above 
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Table 4: Genes selected by Chi square for Hyperdiploid >50 





Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 


Below 










value 


Mean 


1 


36620_at 


superoxide dismutase 1 soluble 


SOD1 


X02317 


52.43 


Above 






amyotropliic lateral sclerosis 1 














adult 








Above 


2 


37350_at 


Human DNA sequence from clone 


PSMD10 


AL03H77 


48.71 






889N15 on chromosome Xq22.1- 










3 


171_at 


von Hippel-Lindau binding protein VBP1 


U56833 


45.80 


Above 


4 


37677_at 


i 


PGK1 


V00572 


45.80 


Above 


5 


41724_at 


accessory proteins xJAtj i/r>/vrzy 






4^ S8 


Above 


6 


32207_at 


membrane protein palmitoylated 1 


MPP1 


M64925 


44.07 


Above 






55kD 








Above 


7 


3873S_at 


SMT3 suppressor of mif two 3 


SMT3H1 


X995b4 








yeast homolog 1 










8 


40480_s_at 


KVTsJ onrncrpnp related to SRC 


FYN 


M14333 


43.57 


Above 














Above 


9 


38518 at 


sex comb on midleg Drosophila 


SCML2 


Y1S004 


43.20 






like 2 








Above 


10 


41132_r_at 


heterogeneous nuclear 


HNRPH2 


U01923 


43.15 






♦1 1 .t T TO T T 

nbonucleoprotern H2 H 








Below 


11 


31492_at 


muscie specmc gene 


M9 


AB019392 


43.01 


12 


38317_at 


transcription elongation factor A 


TCEALl 


M99701 


41.10 


Above 






SIT like 1 








Above 


13 


40998_at 


trinucleotide repeat containing 1 1 


TNRCll 


AF071309 


40.88 






THR-associated protein 230 kDa 














subunit 








Above 


14 


35688_g_at 


mature T-cell proliferation 1 


MTCPl 


Z24459 


40.52 


15 


40903_at 


ATPase H transporting lysosomal 


APT6M8-9 


AL049929 


40.33 


Above 






vacuolar proton pump membrane 














sector associated protein M8-9 










i 

16 


3o4oy_at 


phosphoribosyl pyrophosphate 


PRPSl 


D00860 


40.33 


Above 






synthetase 1 








Above 


17 


1520__s_at 


interleukin 1 beta 


ILIB 


X04500 


40.29 


18 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 


38.74 


Above 






factor 1 










19 


38604_at 


neuropeptide Y 


NPY 


AI198311 




Above 


20 


31863_at 


KIAA0179 protein 


KIAA0179 


D80001 


38.26 


Above 


21 


890_at 


ubiquitin-conjugating enzyme 


UBE2A 


M74524 


37.99 


Above 






E2A RAD6 homolog 










22 


39402_at 


interleukin 1 beta 


ILIB 


M15330 


37.92 


Above 


23 


41490_at 


phosphoribosyl pyrophosphate 


PRPS2 


Y00971 


37.72 


Above 






synthetase 2 








Above 


24 


34753_at 


synaptobrevin-like 1 


SYBLl 


X92396 


37.72 


25 


40S91Jf__at 


DNA segment on chromosome X 


DXS9879E 


X92896 


37.15 


Above 






unique 9879 expressed sequence 






37.15 


Above 


26 


306_s_at 


high-mobility group nonhistone 


HMG14 


J02621 



chromosomal protein 14 
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Z / 


5 1 04U_at 


nyp o Aa liixiiiic 


HPRT1 


M31642 


37.15 


Above 






«Vir*cr\1ir»riViri«;\/1tTn'nQ'fV a ra<;< : ' 1 














Lesch-Nyhan syndrome 






36.48 


Above 


Zo 






DKC1 


U59151 


29 


36169_at 


NADH dehydrogenase ubiquinone 


NDUFAl 


JN4/3U7 


1/Z A Q 

30.4o 


Above 






ill t 1 . 1 ^ CI ,T\ 

1 alpha subcomplex 1 7.5kJJ 














1V1 VV J. J_/ 








Above 


3U 






SH3BP5 


AB005047 


35.95 






o l Jv-associaiea 










31 




uansmemDrane trdiiiuiviiig piutcm 


TMP21 


L40397 


35.88 


Above 


32 


37014_at 


myxovirus influenza resistance 1 


MXl 


M33882 


35.65 


Above 






homolog of murine interferon- 














inducible protein p78 






35.55 


Above 


33 


34374_g_at 


upstream regulatory element 


UREBl 


Z97054 






binding protein 1 






35.55 


Above 


34 


36542_at 


solute carrier family 9 




AF030409 






sodium/hydrogen exchanger 














icofonn 6 








Above 


35 


688_at 


proteasome prosome macropain 


PSMCl 


L02426 


35.55 






26S subunit ATPase 1 










36 


955_at 


calmodulin type I 




HG1862- 


35.55 


Above 








HT1897 






37 


35816_at 


cystatin B stefin B 


CSTB 


U46692 


35.27 


Above 


38 


3S459_g_at 


Human cytochrome b5 (CYB5) 


CYB5 


L39945 


35.18 


Above 


39 


41288_at 


gene 

matrix Gla protein 


MGP 


AL036744 


35.18 


Above 


40 


3225 l_at 


hypothetical protein FLJ21 174 


FLJ2H74 


AA149307 


35.14 


Above 



Table 5: Genes selected by Chi square for MLL 





Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 
value 


Below 
Mean 


1 


34306_at 


muscleblind Drosophila like 


MBNL 


AB007888 


64.07 


Above 


2 


40797_at 


a disintegrin and 


ADAM 10 


AF009615 


62.85 


Above 






metalloproteinase domain 10 








Above 


3 


33412_at 


LGALS1 Lectin, galactoside- 
binding, soluble, 1 


LGALS1 


AI535946 


57.97 


4 


39338_at 


SI 00 calcium-binding protein 
A10 annexin II ligand calpactin 


S100A10 


AI201310 


57.97 


Above 






I light polypeptide pi 1 






55.22 


Above 


5 


2062_at 


insulin-like growth factor 


IGFBP7 


L19182 






binding protein 7 








Above 


6 


32193_at 


plexin CI 


PLXNC1 


AF030339 


53.59 


7 


40518_at 


protein tyrosine phosphatase 


PTPRC 


Y00062 


53.40 


Above 






receptor type C 








Above 


8 


36777_at 


DNA segment on chromosome 
12 unique 2489 expressed 


D12S2489E 


AJ001687 


51.47 






sequence 






50.73 


Below 


9 


32207_at 


membrane protein palmitoylated MPP1 


M64925 






1 55kD 










10 


33859 at 


sin3-associated polypeptide 
lSkD 


SAP18 


U96915 


50.48 


Above 
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11 38391_at 

12 40763_at 

13 1126__s_at 

14 3472 l_at 

15 37809_at 

16 34861_at 

17 38194_s_at 

18 657_at 

19 36918_at 

20 32215_i_at 

21 38160_at 

22 38413_at 

23 1389 at 



24 34168_at 

25 2036_s_at 

26 40522_at 

27 854_at 

28 40067_at 

29 39756_g_at 

30 36940_at 

31 36935_at 

32 32134_at 

33 39379_at 

34 40493_at 

35 769_s_at 

36 40415__at 

37 35983_at 

38 40519_at 

39 794_at 

40 41234 at 



capping protein actin filament 
gelsolin-like 
Meisl mouse homolog 

cell surface glycoprotein CD44 
gene 

FK506-binding protein 5 

homeo box A9 

golgi autoantigen golgin 
subfamily a 3 

immunoglobulin kappa constant 

protocadherin gamma subfamily 
C3 

guanylate cyclase 1 soluble 
alpha 3 

KIAA0878 protein 

lymphocyte antigen 75 

defender against cell death 1 

membrane metallo- 
endopeptidase neutral 
endopeptidase enkephalinase 
CALL A CD 10 
deoxynucleotidyltransferase 
terminal 

CD44 antigen homing function 
and Indian blood group system 
glutamate-ammonia ligase 
glutamine synthase 
B lymphoid tyrosine kinase 

E74-like factor. 1 ets domain 
transcription factor 

X-box binding protein 1 
TGFB1 -induced anti-apoptotic 
factor 1 

RAS p21 protein activator 
GTPase activating protein 1 

testin 

Homo sapiens mRNA cDNA 
DKFZp586C1019 from clone 
DKFZp586C1019 
Human cell surface glycoprotein 
CD44 
annexin A2 
acetyl-Coenzyme A 
acyltransferase 1 peroxisomal 3 
oxoacyl-Coenzyme A thiolase 
hypothetical protein R321S41 

protein tyrosine phosphatase 
receptor type C 
protein tyrosine phosphatase 
non-receptor type 6 

DnaJ Hsp40 homolog subfamil; 
B member 6 



CAPG 


M94345 


50.26 


Above 


MEIS1 


U85707 


50.26 


Above 


CD44 


L05424 


50.17 


Above 


FKBP5 


U42031 


50.17 


Above 


HOXA9 


U41813 


50.17 


Above 


GOLGA3 


D63997 


47.58 


Below 


IGKC 


M63438 


46.18 


Below 


PCDHGC3 


LH373 


46.05 


Above 


GUCY1A3 


Y15723 


43.90 


Above 


KIAA0878 


AB020685 


43.90 


Above 


LY75 


AF0H333 


43.90 


Above 


DAD1 


D15057 


43.90 


Above 


iVLLVlxi 




43.82 


Below 


JJJN 1 1 


Ml 1 79? 


43.82 


Below 


CD44 


M59040 


42.55 


Above 


GLUL 


X59834 


42.55 


Above 


BLK 


S76617 


42.34 


Above 


llLr 1 


iVloZooz 




A.bove 


XBP1 


Z93930 


39.95 


Below 


TIAF1 


ijooy /u 




DC1UW 


RASAl 


M23379 


38.77 


Above 


DKFZP586 


AL050162 


38.77 


Above 












AL049397 


38.77 


Above 


i CD44 


L05424 


38.44 


Above 


ANXA2 


D00017 


37.61 


Above 


ACAAl 


X14813 


37.55 


Above 


R32l84_l 


AC004528 


37.55 


Above 


PTPRC 


Y00638 


36.56 


Above 


PTPN6 


X62055 


36.56 


Above 


DNAJB6 


AI540318 


36.56 


Above 
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Table 6: Genes selected by Chi square for Novel risk group 





Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 


Below 












value 


Mean 


1 


37960_at 


carbohydrate chondroitiri 


CHST2 


AB014679 


175.82 


Above 






6/keratan sulfotransferase 2 








Above 


2 


31892_at 


protein tyrosine phosphatase 


PTPRM 


X5o2oo 


1 OO 

1 /Z.OJ 






recepior rype ivi 






172.85 


Above 


3 


994_at 


protein tyrosine phosphatase 


PTPRM 


X58288 






receptor type M 










4 


995_g_at 


protein tyrosine phosphatase 


PTPRM 


X58288 


172.85 


Above 






lC^cpivJi Lyjjc ivx 










5 


41074_at 


G protein-coupled receptor 49 


GPR49 


AF062006 


139.36 


Above 


O 


AlCiTX of 

4iu / j_ar 


|~1 nrntAiii PAiinlPn rprpnlnr Zl Q 


GPR49 


AI743745 


139.36 


Above 


7 . 


34676_at 


KIAA 1 099 protein 


VT A A 1 AOO 


A T3fl9Qfi99 


1 17 71 


A"hnvf» 

/lUUVt 


8 


36139_at 


DKFZP586G0522 protern 


JJKr Z/Jr j o o Lj U 0 
99 


A T fKH9RQ 


1 97 OS 


rvuuvt' 


9 


37542 at 


lipoma HMGIC fusion partner- 


LHFPL2 


D86961 


120.79 


Above 






1iV*» 9 
ilKC Z 










10 


41159 at 


elathrin heavv "DolvDeotide He 


CLTC 


D21260 


115.15 


Above 


11 


40081 at 


phospholipid transfer protern 


PLTP 


L26232 


108.33 


Above 


1 9 




ilUlllO.ll IttllUJlVJ. y\- ll'V/V'pr LISA 


RXR 


U66306 


107.39 


Above 






alnha mRNA 1* TJTR nartial 




















107.39 


Above 


1 ^ 


ooyuo at 


cannauinoiLi rcccpiui jl uiaiii 




U73304 


14 


39878_at 


protocadherin 9 




A T^9A 19^ 


00 9ft 


A V"\f\\?f* 
/vuuvc 


15 


41747_s_at 


Human myocyte-specific 


TV/TCDO A 


T T/10A90 


00 9ft 


A VirkA/f* 

AUUVV 






ptiliQnrpr firtr\r 9 A f A/IP T?9 A i 

ennanccr idcior ^i-rv {Lvis^r 














gene, last coding exon, and 














comolete cds 










16 


33410_at 


integrin alpha 6 


ITGA6 


S66213 


96.17 


Above 


17 


1 A f\ A ~7 —.4. 

3494 /_at 


pnorDonn-tiKe protern ivjjjoi/iy 






93.59 


Above 


18 


36029_at 


chromosome 11 open reading 


pi i /""YD "DC 


T T^'701 1 


01 <\0 


A.bove 






frame 8 










19 


41708_at 


KIAA 1034 protein 


KIAA1034 


AB028957 


92.60 


Above 


20 


1664_at 


insulin-like growth factor 2 


IGF2 


HG3543- 


92.60 


Above 








HT3739 






21 


32736_at 


HSPC022 protein 


HSPC022 


W68830 


91.62 


Below 


22 


41266_at 


integrin alpha 6 


ITGA6 


X53586 


86.95 


Above 


23 


36566_at 


cystinosis nephropathic 


CTNS 


AJ222967 


82.89 


Above 


24 


1825_at 


IQ motif containing GTPase 


IQGAP1 


L33075 


0 1 on 
81.2U 


Below 






activating protein 1 










9^ 


1711 at 


■nlatplpt-HfTivpH orowth factor 


PDGFRA 


M21574 


78.22 


Above 






receptor alpha polypeptide 










26 


37023_at 


lymphocyte cytosolic protein 1 


LCP1 


J02923 


78.22 


Below 






L-plastin 








Above 


27 


33037_at 


carbohydrate N- 


CHST7 


AL022165 


76.00 






acetylglucosamine 6-0 














sulfotransferase 7 










28 


33411_g_at 


integrin alpha 6 


ITGA6 


S66213 


75.47 


Above 


29 


538_at 


CD34 antigen 


CD34 


S53911 


74.86 


Above 
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30 3910S_at 

31 38364_at 

32 40423_at 

33 35192_at 

34 39037 at 



35 38747_at 

36 37687_i_at 

37 1857_at 

38 3S618_at 

39 31782_at 

40 32842 at 



lanosterol synthase 2 3- 

oxidosqualene-lanosterol 

cyclase 

BCE-1 protein 

KIAA0903 protein 

glycine dehydrogenase 
decarboxylating glycine 
decarboxylase glycine cleavage 
system protein P 
myeloid/lymphoid or mixed- 
lineage leukemia trithorax 
Drosophila homolog 
translocated to 2 
Human CD34 gene, exon 8. 



LSS 

BCE-1 

K1AA0903 

GLDC 

MLLT2 



CD34 
FCGR2A 



Fc fragment of IgG low affinity 
Ha receptor for CD32 

MAD mothers against 
decapentaplegic Drosophila 
homolog 7 

Human P AC clone RP3-5 1 5N 1 LEVIK2 
from22qll.2-q22 
prostaglandin D2 receptor DP 
B-cell CLL/lymphoma 7A 



MADH7 



PTGDR 
BCL7A 



U22526 

AF068197 
AB020710 
D90239 

L13773 



M81945 
M31932 

AF010193 

AC002073 

U31099 
X89984 



PCT/US03/08486 
71.90 Above 

71.90 Above 
71.29 Above 
71.29 Above 

71.29 Above 



69.45 Above 

67.75 Above 

66.28 Above 

64.03 Above 

61.92 Above 

61.57 Above 



Affymetrix 
number 

1 38319 at 



2 
3 
4 
5 



1096_g__at 
38242_at 
32794_g_at 
379S8_at 

38017 at 



7 35016_at 

8 36277_at 

9 38095 JLat 

10 39318_at 

11 38147_at 

12 41723 s at 



Table 7. Genes selected for Chi square for T-ALL 

Gene Name GeneSymbol Reference 

number 



CD3D antigen delta polypeptide CD3D AA919102 
TiT3 complex 

CD19 antigen CD19 M28170 

B cell linker protein SLP65 AF068 1 80 

T cell receptor beta locus TRB X00437 

CD79B antigen CD79B M89957 

immunoglobulin-associated beta 

CD79A antigen CD79A U05259 

immunoglobulin-associated 

alpha 

Human la-associated invariant M13560 M13560 
gamma-chain gene, exon 8, 
clones lambda-y(l,2,3). 

Human membran protein (CD3- CD3E M23323 
epsilon) gene, exon 9. 

major histocompatibility HLA-DPB 1 M83664 

complex class H DP beta 1 

T-cell leukemia/lymphoma 1 A TCL1A X82240 
SH2 domain protein 1 A Duncan SH2D1A AL023657 
s disease lymphoproliferative 
syndrome 

major histocompatibility HLA-DRB1 M32578 

complex class II DR beta 1 



Chi 
square 
value 

215.00 

206.48 
198.52 
197.71 
197.71 



197.53 

191.09 

189.78 
189.78 



Above/ 
Below 
Mean 

Above 

Below 
Below 
Above 
Below 



197.53 Below 



Below 



Above 

Below 

Below 
Above 



189.25 Below 
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13 


38833_at 


Human mRNA for SB classll 




X00457 


189.03 


Below 






histocompatibility antigen 














alpha-chain 










1 A 

1H- 


jdzoo at 


riuman i -lympnocyre specinc 


11/JtV. 




1 SO 0^ 


rWJ\j VC 






protein tyrosine kinase p561ck 














(lck) abberant mRNA 










15 


37039_at 


major histocompatibility 


HLA-DRA 


J00194 


188.93 


Below 






complex class II DR alpha 










16 


38051_at 


mal T-cell differentiation protein MAL 


X76220 


188.93 


Above 


17 


37344_at 


major histocompatibility 


HLA-DMA 


X62744 


187.25 


Below 






complex class II DM alpha 










18 


38096_f_at 


maior histocomnatibilitv 


HLA-DPB1 


M83664 


182.38 


Below 






comnlex class II DP beta 1 












9HSQ c rat 


lymphocyte- specific protein 


LCK 


lVOOoo 1 


189 ^3 


AU 

Above 






tyrosine kinase 










20 


1105_s_at 


T cell receptor beta locus 


TRB 


M12886 


180.45 


Above 


21 


32649_at 


tranQrrintinn f^ftm* 7 T-ppII 

Li. CHI ov/ a 1 LJ L1UJJI l_d^/L\JX / 1 


TCF7 


X59871 


177.84 


Above 
















22 


38949_at 


■protein kinase ("! theta 


PRKCQ 


L01087 


172.59 


Below 


23 


39709_at 


selenoprotein W 1 


SEPW1 


U67171 


171.96 


Above 




*rll DJ g dL 


immunoplobulin heavv constant 


IGHM 




1/1 .70 


Below 




1647^ at 


mu 

ubiquitin specific protease 20 


USP20 




167 ">7 

iU / .x« / 


rVUUVC 


26 


266_s_at 


CD24 antigen small cell lung 




L33930 


165.56 


Below 






carcinoma cluster 4 antigen 










9T 


4Uj /u_at 


forkhead box Ol A 


FOXOIA 


ArOiZoo^ 


165.29 


Below 






lllcHJiaKJllLy Uoal L-Ulllu. 










95 


*fu / /_> at 


iirf'f*CTTSi1 TiipmhrQriP Tvrrvf"£»i"T"k 9 A 
•LLilCgla.1 XllClliUiaUC piLHClll Z,jTV 


TTA/T9 A 


A T f\9 1 "70^ 


104.14 


Above 


90 

£y 


^7490 i sit 


Human DNA sequence from 




AT 09979^ 


1 (kA \A 
104.14 


Below 






clone RP3-377H14 on 














chromosome 6p2 1.3 2-22.1. 










30 


1085_s_at 


nhosnliolinase O oamma 2 


PLCG2 


M37238 


161.30 


Below 






phosphatidylinositol-specific 










31 


38018_g_at 


CD7QA antigen 


CD79A 


U05259 


160.51 


Below 






lTnTni ill r\cr 1 nV^i iIttt— nccor^i^tf^H 
miilimiu^lU U Ullll'aooUUlalCU 














a1r»lia 










^9 


jjo^jo at 


■mir*1f»r\ViinHin 9 

JLLllL/lCvJUlJLILlill 




A/o/32 


loU.O/ 


Above 


33 


41166_at 


immunoglobulin heavy constant 


IGHM 


X58529 


158.50 


Below 


34 


38415_at 


mu 

protein tyrosine phosphatase 


PTP4A2 


U14603 


155.78 


Above 






type IVA member 2 










35 


38S93_at 


neutrophil cytosolic factor 4 


NCF4 


AL008637 


155.78 


Below 






40kD 










36 


1241_at 


protein tyrosine phosphatase 


PTP4A2 


U14603 


155.78 


Above 






type IVA member 2 










^7 

j / 


~>-l /y^__at 


T cell receptor beta locus 


TRB 


XU0437 


155.43 


Above 


38 


36571_at 


topoisomerase DNA II beta 


TOP2B 


X68060 


152.16 


Below 






ISOkD 










39 


37399_at 


aldo-keto reductase family 1 


AKR1C3 


D 17793 


151.93 


Above 






member C3 3 -alpha 














hydroxysteroid dehydrogenase 














type n 










40 


41097_at 


telomeric repeat binding factor 2 TERF2 


AF002999 


151.86 


Below 
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Table 8. Genes selected by Chi square for TEL-AML1 








Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 


Below 










value 


Mean 


1 


38652_at 


hypothetical protein FLJ20154 


FLJ20154 


AF070644 


137.92 


Above 


2 


36239 at 


POU domain class 2 associating POU2AF1 


7 AQ 1 QA 


131.43 


Above 






factor 1 








Above 


-i 


/I 1/1/19 Of 


core-binding factor runt domain 


CBFA2T3 


A"RfM 041 0 


130.17 






aipna suuimii z u aiibiuL/aicu. lu _> 








Above 


A 
H 


D 1 1 Ou al 


piccolo presynaptic cytomatrix 


PCLO 


apioi 11^1 


126.79 






protein 






125.47 


Above 


5 


36985 at 


isopentenyl-diphosphate delta 


IDI1 


X17025 














Above 


6 


38578_at 


tumor necrosis factor receptor 


TNFRSF7 


M63928 


115.72 






superfamily member 7 






112.87 


Above 


7 


38203_at 


potassium inieirneaiaie/smd.n 




U69883 






conductance calcium-activated 














cnannei suDiaiiiiiy in nicmuci 1 










8 


35614_at 


transcription factor-like 5 basic 


TCFL5 


AB012124 


108.45 


Above 






helix-loop-helix 










9 


32224_at 


KIAA0769 gene product 


KIAA0769 


AB0183H 


107.08 


Above 


10 


32730_at 


Homo sapiens mRNA for 




AL080059 


104 93 


Above 






KIAA1750 protein partial cds 










11 


35665_at 


phosphoinositide-3 -kinase class 


PDC3C3 


Z46973 


104.83 


Above 


12 


1077_at 


3 

recombination activating gene 1 


RAG1 


M29474 


102.90 


Above 


13 


36524_at 


Rho guanine nucleotide 


ARHGEF4 


AB029035 


100.67 


Above 






exchange factor GEF 4 








Above 


14 


34194__at 


Homo sapiens cDNA FLJ21697 






no i 1 

yo.3 1 






fis clone COL09740 










15 


36937 s at 


PDZ and LEVI domain 1 elfin 


PDLIM1 


U90878 


96.91 


Below 


16 


36008__at 


protein tyrosine phosphatase 


PTP4A3 


AF041434 


96.68 


Above 






type IV A member 3 










17 


1299_at 


telomeric repeat binding factor 2 TERF2 


X93512 


7J.VO 


AV»nvp 
rvuuvc 


18 


41814_at 


fucosidase alpha-L- 1 tissue 


FUCA1 


A/T7QR77 


92 77 


Above 


19 


41200_at 


CD36 antigen collagen type I 


CD36L1 


Z22555 


on s/^ 


Above 






receptor thrombospondin 














receptor like 1 








Above 


20 


35238_at 


TNF receptor-associated factor 5 TRAF5 


AB000509 


90.81 


91 


8R0 at 

oou al 


FK506-binding protein 1A 12kD FKBP1A 


M34539 


86.69 


Above 


22 


33690_at 


Homo sapiens mRNA cDNA 




AL080190 


86.69 


Above 






DKFZp434A202 from clone 














DKFZp434A202 










23 


40272_at 


collapsin response mediator 


CRMP1 


D78012 


85.44 


Above 






protein 1 








Above 


24 


35362_at 


myosin X 


MYO10 


ABO 18342 


O J. \J\J 


25 


41819_at 


FYN-binding protein FYB- 


FYB 


U93049 


83.25 


Above 






120/130 








Above 


26 


40279_at 


KIAA0121 gene product 


KIAA0121 


D50911 


81.66 


27 


1488_at 


protein tyrosine phosphatase 


PTPRK 


L77886 


81.66 


Above 






receptor type K 
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28 


1325_at 


MAD mothers against 
decapentaplegic Drosophila 


MADH1 


U59423 


81.17 


Above 






homolog 1 






80.37 


Above 


29 


3790S_at 


guanine nucleotide binding 


fTNni i 

VJlNO 1 1 


U31384 


30 


769_s_at 


protein 1 1 
annexin A2 




D00017 


78.68 


Below 


31 


33415_at 


non-metastatic cells 2 protein 


NME2 


X58965 


77.04 


Below 






NM23B expressed in 






76 


Below 


32 


19S0_s_at 


non-metastatic cells 2 protein 


NMbz 








NM23B expressed in 








Above 


33 


32579_at 


SWI/SNF related matrix 
regulator of chromatin 


SMARCA4 


D2ol JO 


/O.jj 






bUUIuIIUiy « IllCillUCJ. *-r 








Above 


34 


39425_at 


thioredoxin reductase 1 




X91247 


75.97 


35 


755_at 


inositol 1 4 5 -triphosphate 


ITPR1 


DZOU /U 


/Z>.0O 


Above 






tpc p t r\tr\v hmp 1 






75.11 


Above 


36 


37343_at 


inositol 1 4 5 -triphosphate 


ITPR3 


U01062 






receptor type 3 






73.96 


Above 


37 


1336_s_at 


protein kinase C beta 1 


PRKCB1 


X06318 


38 


41097_at 


telomeric repeat binding factor 2 


TERF2 


AF002999 


73.84 


Above 


39 


31786_at 


Sam68-like phosphotyrosine 


T-STAR 


AF051321 


73.72 


Above 






protein T-STAR 






73.66 


Above 


40 


160029_at 


protein kinase C beta 1 


PRKCB1 


X07109 



2. Correlation-based Feature Selection (CFS) 
5 The Correlation-based Feature Selection (CFS) is a method that evaluates 

subsets of genes rather than individual genes. (Hall and Holmes 
(2000),"Benchmarking Attribute Selection Techniques for Data Mining," Working 
Paper 00/10, Department of Computer Science, University of Waikato, New Zealand). 
The core of the algorithm is a subset evaluation heuristic that takes into account the 

10 usefulness of individual features for predicting the class along with the level of 
intercorrelation among them with the belief that "good feature subsets contain 
features highly correlated with the class, yet uncorrected with each other". The 
heuristic assigns a score Merit s to a subset S containing k genes, defined as Merit s = 
(k* r C f)/sqrt(k + k * (k - 1) * r ff ), where r cf is the average gene-class correlation and r ff 

15 is the average gene-gene correlation. Like the Chi square method, CFS first 

discretizes the gene expressions into intervals and then calculates a matrix of gene- 
class and gene-gene correlations from the framing data for merit calculation. The 
correlation between two genes or a gene and a class is calculated as r xy = 2 * [H(X) + 
H(Y) - H(X,Y)]/[H(X) + H(Y)], where H(X) is the entropy of a gene X. CFS starts 
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from an empty set of genes and uses the best-first search technique with a stopping 
criterion of 5 consecutive fully expanded non-improving subsets. The subset with the 
highest merit found during the search is selected. Tables 9-15 list the top gene subsets 
chosen by CFS for each subtype. For subtype prediction, each gene subset must be 
used in its entirety, as within each subset, all genes are equally ranked. 



Affymetrix 
number 



1 36650_at 

2 40196_at 

3 1635_at 

4 33775_s_at 

5 1636_g_at 

6 41295_at 

7 1326_at 

8 33150_at 

9 4005 l_at 

10 3906 l_at 

11 33172_at 

12 37399_at 

13 317_at 

14 330_s_at 

15 3857S_at 

16 39044_s_at 

17 32562_at 

18 38641_at 

19 1211_s_at 

20 39730_at 

21 36591_at 

22 36035 at 



Table 9. Genes selected by CFS: BCR-ABL 
Gene Name 



cyclin D2 
HYA22 protein 

proto-oncogene tyrosine-protein 
kinase (AJBL) gene 

caspase 8 apoptosis-related cysteine 
protease 

proto-oncogene tyrosine-protein 
kinase (ABL) gene 

GTT1 protein 

caspase 10 apoptosis-related cysteine 
protease 

disrupter of silencing 10 

TRAM-like protein 

bone marrow stromal cell antigen 2 

hypothetical protein FLJ1 0849 

aldo-keto reductase family 1 member 
C3 3-alpha hydroxysteroid 
dehydrogenase type II 
protease cysteine 1 legumain 

tubulin, alpha 1, isoform44 

tumor necrosis factor receptor 
superfamily member 7 

diacylglycerol kinase delta 130kD 
endoglin Osler-Rendu- Weber 
syndrome 1 

Homo sapiens mRNA for TSC-22- 
like protein 

CASP2 and RIPK1 domain containing CRADD 
adaptor with death domain 

v-abl Abelson murine leukemia viral ABL1 
oncogene homolog 1 

tubulin alpha 1 testis specific TUBA1 
anchor attachment protein 1 Gaalp GPAA1 
yeast homolog 



GeneSymbol 


Reference 
number 


Above/ 
Below 
Mean 


CCND2 


D13639 


Above 


HYA22 


D88153 


Above 


ABL 


U07563 


Above 


CASP8 


X98176 


Above 


ABL 


U07563 


Above 


aT"Ti 

ul 1 1 


AT 041 7R0 


Above 


CASP10 


U60519 


Above 


SAS10 


AI126004 


Above 


KIAA0057 


D31762 


Above 


BST2 


D28137 


Above 


FLJ10849 


T75292 


Above 


AKR1C3 


D 17793 


Above 


PRSC1 


D55696 


Above 


TUBA1 
TNFRSF7 


HG2259- 

HT2348 

M63928 


Above 
Above 


DGKD 


D73409 


Below 


ENG 


X72012 


Above 



AJ133115 

U84388 

X16416 

X06956 
AB002135 



Above 

Above 

Above 

Above 
Above 
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23 


980_at 


Niemann-Pick disease type CI 


~\ro/~^ 1 
JNr^i^l 




Above 


24 


40698_at 


C-type calcium dependent 


CLECSF2 


X96719 


Above 






caruonyciraie"-recogniiioii cioindiii 












lectin superfamily member 2 












activation-induced 








25 


39330_s_at 


actinin alpha 1 


ACTN1 


M95178 


Above 


26 


2001 g at 

O — 


ataxia telangiectasia mutated includes 


ATM 


U26455 


Above 






complementation groups A C and D 








1*7 

2/ 


393iy_at 


lympnocyte cyxosoiic proiein z oxiz 






Above 






domain-containing leukocyte protein 












of76kD 








28 


376S5_at 


Clathrin assembly lymphoid-myeloid 


CLTH 


U45976 


Above 














29 


33S13_at 


tumor necrosis factor receptor 


TNFRSF1B 


A TO 1 T COO 

AI813532 


Above 






qi mpr fa mi Iv member IB 








30 


33134_at 


adenylate cyclase 3 


ADCY3 


AB011083 


Above 


31 


36536_at 


schwaimomin interacting protein 1 


SCHIP-1 


AF070614 


Above 




joyoD_at 


lsopenienyi-aipiiospudie uexia 


TDT1 


vi 7025 


Below 






isomerase 








33 


35991_at 


Sm protein F 


LSM6 


AA9 17945 


Above 


34 


33774_at 


caspase 8 apoptosis-related cysteine 


CASP8 


X98172 


Above 






protease 








35 


37470_at 


leukocyte-associated Ig-like receptor 


LAIR1 


AF013249 


Above 


36 


39245_at 


1 

Human 4087 1 mRNA partial 




U72507 


Above 






sequence 








37 


40076_at 


tumor protein D52-like 2 


TPD52L2 


AF004430 


Below 


38 


39370_at 


Microtubule-associated proteins 1A 


MAP1ALC3 


W28807 


Below 






and IB light chain 3 








on 

3y 


A 1 o+ 

413 y 4_at 


Janus kinase 1 a protein tyrosine 




ivr^4 1 id 

IVlUt 1 / *T 


A Vxrii/F* 
AUUVC 






kinase 








4U 


4133 o_at 


ammo-terminal enhancer of split 






DC1UW 


41 


32319_at 


tumor necrosis factor ligand 


TNFSF4 


AL022310 


Above 






superfarnily member 4 tax- 












transcriptionally activated 












glycoprotein 1 34kD 








42 


33924_at 


KIAA1091 protein 


KIAA1091 


AB029014 


Above 


43 


37397_at 


platelet/endotlielial cell adhesion 


PECAM 


L34657 


Above 






moiecuie- 1 \jr n-v^/vivi- 1 ) gene 








44 


37190_at 


WAS protein family member 1 


WASF1 


D87459 


Below 


45 


39070_at 


singed Drosophila like sea urchin 


SNL 


U03057 


Above 






fascin homolog like 








46 


38994_at 


STAT induced STAT inhibitor-2 


STATI2 


AF037989 


Above 


47 


3262 l_at 


down-regulator of transcription 1 


DR1 


M97388 


Above 






iDr-Dinuing neganve coiaciorz. 








48 


40108_at 


KIAA0005 gene product 


KIAA0005 


D13630 


Below 


49 


35238_at 


TNF receptor-associated factor 5 


TRAF5 


AB000509 


Above 


50 


1558_g_at 


p21/Cdc42/Racl -activated kinase 1 


PAK1 


U24152 


Above 



yeast Ste20-related 
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51 


1373_at 


transcription factor 3 E2A 


TCF3 




Below 






immunoglobulin enhancer binding 












factors E12/E47 






Above 


52 


3573 l_at 


integrin alpha 4 antigen CD49D alpha ITGA4 


X16983 






4 subunit of VLA-4 receptor 








53 


38659_at 


suppressor of clear C. elegans 


SHOC2 


AB020669 


Below 






homolog of 












Table 10. Gene selected by CFS for E2A-PBX1 






Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 




number 






number 


Below 












Mean 


1 


33355_at 


Homo sapiens cDNA FLJ12900 fis 


PBX1 


AL049381 


Above 






clone NT2RP2004321 (by CELERA 












search of target sequence = PBX1) 












Table 11. Genes selected by CFS for: Hyperdiploid >50 






Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Above/ 




number 






number 


Below 












Mean 


1 


36620_at 


superoxide dismutase 1 soluble 


SOD1 


X02317 


Above 






amyotrophic lateral sclerosis 1 adult 








2 


37350_at 


clone 889N15 on chromosome 


PSMD10 


AL031177 


Above 






Xq22. 1-22.3. Contains part of the 












gene for a novel protein similar to X. 












laevis Cortical Thymocyte Marker 












CTX 








3 


41724_at 


accessory proteins BAP31/BAP29 


DXS1357E 


X81109 


Above 


4 


38738_at 


SMT3 suppressor of mif two 3 yeast 


SMT3H1 


X99584 


Above 






homolog 1 








5 


40480_s_at 


FYN oncogene related to SRC FGR 


FYN 


M14333 


Above 






YES 








6 


38518__at 


sex comb on midleg Drosophila like 2 SCML2 


Y18004 


Above 


7 


31492_at 


muscle specific gene 


M9 


ABO 19392 


Below 


8 


35688_g_at 


mature T-cell proliferation 1 


MTCP1 


Z24459 


Above 


9 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 


x^ljyj v t> 






factor 1 








i r\ 
W 


3oi/o_at 


transmembrane trafficking protein 


TMP21 


L40397 


Above 


11 


37014_at 


myxovirus influenza resistance 1 


MX1 


M33882 


Above 






homolog of murine interferon- 












inducible protein p78 








12 


34374_g_at 


upstream regulatory element binding 


UREB1 


Z97054 


Above 






protein 1 




L02426 


Above 


13 


688_at 


proteasome prosome macropain 26S 


PSMC1 






subunit ATPase 1 








14 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


Below 


15 


38771_at 


histone deacetylase 1 


HDAC1 


D50405 


Below 
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16 865_at 

17 41143_at 

18 39S67_at 

19 41470_at 

20 41503_at 

21 2039„s_at 

22 36845_at 

23 36940_at 

24 32236_at 

25 36885_at 

26 40200__at 

27 40S42_at 

28 40514_at 

29 41222__at 

30 1294_at 

31 34315_at 

32 39806_at 

33 40875_s_at 

34 38458_at 

35 1817_at 

36 34709_r_at 

37 33447 at 



ribosomal protein S6 kinase 90kD RPS6KA3 
polypeptide 3 

calmodulin (CALM1) gene CALM1 

Tu translation elongation factor TUFM 
mitochondrial 

proniinin mouse like 1 PROML1 

KIAA0854 protein KIAA0854 

FYN oncogene related to SRC FGR FYN 
YES 

KIAA0 136 protein KIAA0136 

TGFB1 -induced anti-apoptotic factor TIAF1 
1 

ubiquitin-conjugating enzyme E2G 2 UBE2G2 
homologous to yeast UBC7 

spleen tyrosine kinase SYK 

heat shock transcription factor 1 HSF1 

Ul snRNP-specific protein A gene SNRPA 

hypothetical 43.2 Kd protein LOC5 1 614 

signal transducer and activator of STAT6 
transcription 6 (STAT6) gene 

ubiquitin-activating enzyme E 1 -like UBE 1 L 



U08316 

U12022 
S75463 

AF027208 
AB020661 
M14333 

D50926 
D86970 

AF032456 

L28824 

M64673 

M60784 

AF091085 

AF067575 

L13852 



AFG3 ATPase family gene 3 yeast 
like 2 

DKFZP547E2 110 protein 

small nuclear ribonucleoprotein 70kD 
polypeptide RNP antigen 

cytochrome b5 (CYB5) gene 
prefoldin 5 
stromal antigen 2 

myosin light polypeptide regulatory 
non-sarcomeric 20kD 



AFG3L2 

DKFZP547E21 
10 

SNRP70 



CYB5 
PFDN5 
STAG2 
MLCB 



Y18314 

AL050261 

X06815 

L39945 
D89667 
Z75331 
X54304 



Above 

Above 
Below 

Above 
Below 
Above 

Above 
Above 

Above 

Below 
Below 
Below 
Below 
Below 

Below 

Above 
Above 
Below 

Above 
Below 
Above 
Above 



38 


1077_at 


recombination activating gene 1 


RAG1 


M29474 


Below 


39 


1915_s_at 


v-fos FBJ murine osteosarcoma viral 
oncogene homolog 


FOS 


V01512 


Above 


40 


38854_at 


KIAA0635 gene product 


KIAA0635 


AB014535 


Above 


41 


37732_at 


RING1 and YY1 binding protein 


RYBP 


AL049940 


Above 


42 


35940_at 


POU domain class 4 transcription 


POU4F1 


X64624 


Above 






factor 1 






Below 


43 


34733_at 


splicing factor 3 a subunit 1 120kD 


SF3A1 


X85237 


44 


245_at 


selectin L lymphocyte adhesion 


SELL 


M25280 


Below 






molecule 1 




AL080212 


Below 


45 


40146_at 


RAP IB member of RAS oncogene 


RAP1B 


46 


40104_at 


family 

serine/tlireonine kinase 25 Ste20 yeast STK25 


D63780 


Below 






homolog 








47 


430_at 


nucleoside phosphorylase 


NP 


X00737 


Above 
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48 


36899_at 


special AT-rich sequence binding 


SATB1 


M97287 


Below 






protein 1 binds to nuclear 












matrix/scaffold-associating DNA s 








49 


35727_at 


hypothetical protein FLJ20517 


FLJ20517 


AJ249721 


Below 


50 


38649_at 


K1AA0970 protein 


KIAA0970 


AB0231S7 


Below 


51 


36107__at 


ATP synthase H transporting 


ATP5J 


AA845575 


Above 






mitochondrial F0 complex subunit F6 








52 


38789_at 


transketolase Wernicke-Korsakoff 


TKT 


L12711 


Below 






syndrome 






Below 




39301 at 


calnain 3 t)94 


CAPN3 


X85030 




hi z / o at 


DrVT JJ 


BAF53A 


AF041474 


Below 


55 


41162_at 


protein phosphatase 1G formerly 2C 


rnviivj 


Y13936 


Below 






magnesium-dependent gamma 












ISOIOllll 






Below 


56 


37819_at 


hypothetical protein 


LOC54104 


AF007130 


57 


38717_at 


DKFZP586A0522 protein 


DKFZP586 


AL050159 


Below 






A0522 






58 


40019_at 


ecotropic viral integration site 2B 


EVI2B 


M60830 


Above 


59 


39489_g_at 


protocadherin 9 


PCDH9 


W27720 


Below 


60 


S57_at 


protem phosphatase 1 A formerly zC 


PPM1A 


0077^0 

bo / /jy 


Above 






magnesium-dependent alpha isoform 








61 


32804_at 


RNA binding motif protein 5 


RBM5 


AF091263 


Below 


62 


37o/o_at 


phosphodiesterase SA 


PDE8A 


AF056490 


Below 


63 


1519_at 


v-ets avian erythroblastosis vims E26 


ETS2 


J04102 


Above 






oncogene homolog 2 








64 


37680 at 


A kinase PRKA anchor protein gravin AKAP12 


U81607 


Below 


65 


548_s__at 


1Z 

spleen tyrosine kinase 


SYK 


S80267 


Below 


66 


39797_at 


KIAA0349 protein 


KIAA0349 




Above 


67 


32789_at 


nuclear cap binding protein subunit 2 


NCBP2 


AA14y4Zo 


Below 






20kD 






Below 


68 


38091_at 


lectin galactoside-binding soluble 9 


LGALS9 


Z49107 






galectin 9 






Below 


69 


41223_at 


cytochrome c oxidase subunit Va 


COX5A 


M22760 


70 


933_f_at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


Lllo/z 


Below 


71 


37012_at 


capping protein actin filament muscle 


CAPZB 


U03271 


Below 






Z-line beta 








72 


35214_at 


UDP-glucose dehydrogenase 


UGDH 


AFOolOlo 


Above 


73 


32434_at 


myristoylated alanine-rich protein 


MACS 


D10522 


Above 






kinase C substrate MARCKS 80K-L 








74 


38345_at 


centrosomal protein 1 


CEP1 


AF083322 


Below 


75 


40404_s_at 


CDC16 cell division cycle 16 S. 


CDC 16 


U18291 


Below 






cerevisiae homolog 








76 


39096_at 


SON DNA binding protein 


SON 


AB02S942 


Above 


77 


33429_at 


DKFZP586M1523 protein 


DKFZP586M1 
523 


AL050225 


Above 


78 


4064 l_at 


TBP-associated factor 172 


TAF-172 


AF038362 


Above 


79 


41381__at 


KIAA0308 protein 


KIAA0308 


AB002306 


Below 



-55- 



BNSDOCID: <WO 03083140A2_I_> 



WO 03/083140 



PCT/US03/08486 



80 35135_at 



81 3942 l_at 

82 195_s_at 

83 3689S_r_at 

84 38792_at 

85 32643 at 



86 38808_at 

87 36062_at 

88 300_f_at 

89 1979_s_at 

90 32230_at 

91 39893_at 

92 3465 l_at 

93 1052_s_at 

94 36272_r_at 

95 2044_s_at 

96 32135 at 



Homo sapiens Similar to CGI 5084 
gene product clone MGC 10471 
mRNA complete cds 

runt-related transcription factor 1 RUNX1 

acute myeloid leukemia 1 amll 

oncogene 

caspase 4 apoptosis-related cysteine CASP4 
protease 

primase polypeptide 2A 5 SkD PRIM2 A 

spermine synthase SMS 
glucan 1 4-alpha- branching enzyme 1 GBE1 
glycogen branching enzyme Andersen 
disease glycogen storage disease type 
IV 

cell membrane glycoprotein 110000M GP110 
r surface antigen 

Leupaxin LPXN 

transcription factor BTF3 homolog 
(GB:M90355) 

nucleolar protein 1 120kD NOLI 
eukaryotic translation initiation factor EIF3S2 
3 subunit 2 beta 36kD 

guanine nucleotide binding protein G GNG7 
protein gamma 7 

catechol-O-methyltransferase COMT 
CCAAT/enhancer binding protein CEBPD 
C/EBP delta 

peripheral myelin protein 2 PMP2 

retinoblastoma 1 including RBI 
osteosarcoma 

sterol regulatory element binding SREBF 1 
transcription factor 1 



X13956 



D43969 

U28014 

X74331 

AD001528 

L07956 



D64154 

AF062075 

HG4518- 
HT4921 

X55504 
U39067 

AB010414 

M58525 
M83667 

X62167 
Ml 5400 

U00968 



Below 



Below 

Below 

Above 
Above 
Below 



Below 

Below 
Below 

Below 
Below 

Below 

Above 
Below 

Below 
Below 

Below 



Affymetrix 
number 

1 34306_at 

2 40797_at 

3 33412_at 

4 39338_at 

5 2062_at 

6 32193_at 

7 40518 at 



Table 12. Genes selected by CFS for MLL 

Gene Name GeneSymbol Reference 

number 

muscleblind Drosophila like MBNL AB007888 

a disintegrin and metalloproteinase ADAM 1 0 AF0096 1 5 
domain 10 

LGALS1 Lectin, galactoside-binding, LGALS1 AI535946 
soluble, 1 (galectin 1) 

S100 calcium-binding protein A10 S100A10 AI201310 
annexin IT ligand calpactin I light 
polypeptide pi 1 

msulin-like growth factor binding IGFBP7 LI 9 1 82 
protein 7 

plexin CI PLXNC1 AF030339 

protein tyrosine phosphatase receptor PTPRC Y00062 

-56- 



Above/ 
Below 
Mean 

Above 

Above 
Above 

Above 

Above 

Above 
Above 



BNSDOCID: <WO_ 



_0308314OA2_L> 



PCT/USOJ/08486 



type C 



8 


36777_at 


DNA segment on cliromosome 12 
unique 2489 expressed sequence 


D12S2489E 


AJ0016S7 


Above 


9 


38391__at 


capping protein actin filament 


CAPG 


M94345 


Above 






gelsolin-like 






nuuvc 


10 


40763_at 


Meisl mouse homo log 


MblJSl 


UoD IK) / 


11 


3472 l_at 


FK506-binding protein 5 


FKBP5 


U42031 


Above 


12 


37S09 at 


homeo box A9 


HOXA9 


U41813 


Above 






KIAA0878 nrotein 


KIAA0878 


AB020685 


Above 


1 A 


ooi ou ax 


1T/Tw»"\lii"\r»"\rt'< 3 .ittH CXPT1 7 S 

lynipnucyLc aiiugwii / u 


LY75 


AF011333 


Above 


15 


1389_at 


membrane metallo-endopeptidase 
neutral endopeptidase enkephalinase 


MME 


J03779 


Below 






CALLA CD 10 






Below 


16 


34168_at 


deoxynucleotidyltransferase terminal 


DNTT 


Ml 1722 


17 


40522_at 


glutamate-ammonia ligase glutamine 


GLUL 


X59834 


Above 






synthase 






Above 


18 


854_at 


B lymphoid tyrosine kinase 


BLK 


S76617 


19 


40067__at 


E74-like factor 1 ets domain 
transcription factor 


ELF1 


M82SS2 


Above 


20 


39756_g_at 


X-box binding protein 1 


XBP1 


Z93930 


Below 


21 


32134_at 


Testing 


DKFZP586 


AL050162 


Above 






B2022 






22 


39379_at 


Homo sapiens rriRNA cDNA 
DKFZp586C1019 from clone 




AL049397 


Above 



23 40415_at 

24 40519_at 

25 33847_s_at 

26 32696_at 

27 40417_at 

28 1644_at 

29 948_s_at 

30 34337_s_at 

31 41747_s_at 

32 39516_at 

33 31820_at 

34 33305_at 

35 40520_g_at 



DKFZp586C1019 

acetyl-Coenzyme A acyltransferase 1 ACAA1 
peroxisomal 3-oxoacyl-Coenzyme A 
thiolase 

protein tyrosine phosphatase receptor PTPRC 
type C 

cyclin-dependent kinase inhibitor 1 B CDKN 1 B 
p27 Kipl 



pre-B-cell leukemia transcription 
factor 3 

KIAA0098 protein 

eukaryotic translation initiation factor 

3 subunit 2 beta 36kD 

peptidylprolyl isomerase D 
cyclophilin D 

putative DNA binding protein 
myocyte-specific enhancer factor 2A 
(MEF2A) gene 

hypothetical protein 
hematopoietic cell-specific Lyn 
substrate 1 

serine or cysteine proteinase inhibitor 
clade B ovalbumin member 1 

protein tyrosine phosphatase receptor 
typeC 



PBX3 

EIF3S2 

PPID 

M96 
MEF2A 

HSPC004 
HCLS1 

SERPINB1 
PTPRC 



X14813 

Y00638 
U109O6 

X59841 

D43950 
U36764 

D63861 

AJ010014 
U49020 

AI827793 
X16663 

M93056 
Y00638 



Above 

Above 
Above 

Above 

Above 
Above 

Above 

Below 
Above 

Above 
Above 

Above 
Above 



-57- 



BNSDOCID: <WO_ 



_03083140A2_I_> 



WO 03/083140 



PCT/US03/08486 



36 


41222__at 


signal transducer and activator of 


STAT6 


AF0675/3 


/VDOVC 






transcriotion 6 ( STAT6) gene 








37 


1718__at 


actm related protein 2/3 complex 






Above 






subunit 2 34 kD 








38 


38342_at 


KIAA0239 protein 


KIAA0239 


D87076 


Below 


39 


38805_at 


TG~mteracting factor TALE family 


TGIF 


X89750 


Below 






homeobox 






AUUVC 


40 


32089_at 


sperm associated antigen 6 


SPAGo 




41 


1950_s_at 


Smad 3, exon 1 






AUU V t/ 


42 


39410_at 


development and differentiation 


\jucr jl 


AB007S60 


Above 






enhancing factor 2 








43 


37280 at 


MAD mothers against 


MADH1 


U59912 


Below 






decapentaplegic Drosophila homolog 








44 


32607_at 


1 

brain acid-soluble protein 1 


BASP1 


AF039656 


Above 






CDQ antieen o24 


CD9 


M38690 


Below 


46 


40913_at 


ATPase Ca transporting plasma 


A1P2B4 




Below 






membrane 4 






Below 


47 


1039_s_at 


hypoxia-inducible factor 1 alpha 


HIF1A 


T TOO/1 1 1 






subunit basic helix-loop-helix 












transcription factor 






Below 


48 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 






factor 1 






Below 


49 


963_at 


ligase IV DNA ATP-dependent 


LIG4 


YOI/1/1 1 


50 


39628_at 


RAB9 member RAS oncogene family RAB9 


U44103 


Below 


51 


38242 at 


B cell linker protein 


SLP65 


AF068180 


Below 


52 


37692_at 


diazepam binding inhibitor GABA 


DBI 


AI557240 


Above 






receptor modulator acyl-Coenzyme A 












binding protein 






Above 


53 


32166_at 


KIAA1027 protein 


K1AA1027 


AB028950 


54 


34800_at 


DKFZP586Q1624 protein 


DKFZP586016 AL039458 

OA 


Below 


55 


34386_at 


metliyl-CpG binding domain protein 4 MBD4 


AF072250 


Below 


56 


40296_at 


hypothetical protein 


753P9 


AL023653 


1_J nvl ATT 7 

rseiow 


57 


40456_at 


up-regulated by BCG-CWS 


LOC64116 


AL049963 


Above 


JO 


jj^Hj a. 1 


ferritin heavy polypeptide 1 


FTH1 


L20941 


Below 


59 


39049_at 


GIS.la and GIS.lb proteins (G18.1a 




AJ243937 


Below 






and GIS.lb genes, located in the class 










III region of the major 












histocompatibility complex) 




X68194 


Above 


60 


38075__at 


synaptophysin-like protein 


SYPL 


61 


932_i_at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


LI 1672 


DCIU W 


62 


1825_at 


IQ motif containing GTPase 


IQGAP1 


L33075 


Above 






activating protein 1 








63 


34210_at 


CDW52 antigen CAMPATH-1 


CDW52 


N90866 


Below 






antigen 






Below 


64 


3977S_at 


mannosyl alpha- 1 3- glycoprotein 


MGAT1 


M55621 






beta-1 2-N- 












acetylglucosaminyltransferase 






Below 


65 


34699_at 


CD2~associated protein 


CD2AP 


AL050105 



-58- 



BNSDOCID: <WO. 



03083 140A2_I_> 



WO 03/083140 _ . - - PCT/US03/08486 



66 


40066_at 


ubiquitin-activating enzyme E1C 


UBE1C 


A T?r\ A iCAO A 

Ar04o024 


Above 














67 


41177_at 


nomothetical protein FLJ 12443 


FLJ 12443 


AW024285 


Above 


68 


32736_at 


HSPC022 protein 


HSPC022 


W68830 


Above 


69 


1928_s_at 


mad protein homolog Smad2 gene 


Smad2 


U7S733 


Below 


70 


10Sl_at 


ornithine decarboxylase 1 


ODC1 


TV if O 

M33764 


Above 


71 


37345_at 


Calumenin 


CALU 


AF013759 


Above 


72 


34099_f_at 


nucleosome assembly protein l-like 1 


NAP1L1 


W26056 


Above 


73 


933_f_at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


LI 1 672 


Below 


74 


32214_at 


thioredoxin-like 32kD 


TXNL 


AF003938 


Below 


75 


33501_r_at 


SNC73 protein SNC73 mRNA 




S71043 


Below 






complete cds 








76 


950_at 


translocation protein 1 


TLOC1 


D87127 


Below 


77 


41161_at 


death-associated protein 6 


DAXX 


ABO 1 505 1 


Below 


78 


41381_at 


KIAA0308 protein 


KIAA0308 


AB002306 


Below 


79 


38705_at 


ubiqumn-conjugatmg enzyme E2D 2 


UBE2D2 


A TO 1 AAAI 


Above 






homologous to yeast UBC4/5 








80 


38617 at 


LIM domain kinase 2 


LIMK2 


D45906 


Below 


O 1 

81 


i4JUj>_at 


poly rC binding protein 1 


PCBP1 


Z29505 


Above 


82 


A f\A 1 £L „ _ + 

4043o_g_at 


solute carrier family 25 mitochondrial SLC25A6 


T0^S92 


Above 




carrier adenine nucleotide translocator 












member 6 








83 


1827_s_at 


c-myc-P64 mRNA, initiating from 




M13929 


Above 






promoter P0 








84 


38479_at 


acidic protein rich in leucines 


SSP29 


Y07969 


Below 


85 


33207 at 


DnaJ Hsp40 homolog subfamily C 


DNAJC3 


AI095508 


Below 






member 3 








86 


39039_s_at 


CGI-76 protein 


LOC51632 


AI557497 


Below 


87 


32157_at 


protein phosphatase 1 catalytic 


PPP1CA 


S57501 


Above 






subunit alpha isoform 








88 


905 at 


guanylate kinase 1 


GUK1 


L76200 


Below 


CO 


i_> /y4_at 


KIAA0942 protein 


KIAA0942 




Below 


90 


1007_s_at 


discoidin domain receptor family 


DDR1 


U48705 


Below 






member 1 








01 
y i 




tumor necrosis factor receptor 


TNFRSF14 


U70321 


Below 






superfamily member 14 herpesvirus 












entry mediator 








92 


36634_at 


BTG family member 2 


BTG2 


U72649 


Below 


93 


38760 Jf_at 


butyrophilin subfamily 3 member A2 


BTN3A2 


U90546 


Below 



Affymetrix 
number 

37960_at 
31892_at 



Table 13. Genes selected by CFS for Novel Class 

Gene Name GeneSymbol Reference 

number 



carbohydrate chondroitin 6/keratan 
sulfotransferase 2 



CHST2 



protein tyrosine phosphatase receptor PTPRM 
type M 



ABO 14679 



X58288 



Above/ 
Below 
Mean 

Above 



Above 



-59- 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083140 



PCT/US03/08486 



D 


yy4_at 


protein tyrosine phosphatase receptor 


T1T 1 T>T> TV X 


"XT' C O O O O 

X58288 


Above 






type M 








4 


995_g_at 


protein tyrosine phosphatase receptor 


PTPRM 


X58288 


Above 






type M 








5 


41074_at 


G protein-coupled receptor 49 


GPR49 


AF062006 


Above 


6 


41073_at 


G protein-coupled receptor 49 


GPR49 


AI743745 


Above 


7 


34676_at 


KIAA1099 protein 


KIAA1099 


AB029022 


Above 


8 


36139_at 


DKFZP5 S 6G0522 protein 


DKFZP586G05 
22 


AL050289 


Above 


9 


37542_at 


lipoma HMGIC fusion partner-like 2 


LHFPL2 


D86961 


Above 


10 


41159_at 


clathrin heavy polypeptide He 


CLTC 


D21260 


Above 


11 


32800_at 


retinoid X receptor alpha mRNA 




U66306 


Above 


12 


1664_at 


insulin-like growth factor 2 


IGF2 


HG3543- 


Above 










HT3739 




13 


36566_at 


cystinosis nephropathic 


CTNS 


AJ222967 


Above 



Affymetrix 
number 

1 383 19 at 



Table 14. Gene selected by CFS for T-ALL 
Gene Name GeneSymbol Reference 

number 



CD3D antigen delta polypeptide T1T3 CD3D 
complex 



AA919102 



Above/ 
Below 
Mean 

Above 



Affymetrix 
number 

38652_at 
36239_at 

41442 at 



Table 15. Genes selected by CFS for TEL-AML1 

Gene Name GeneSymbol Reference 

number 



hypothetical protein FLJ20 1 54 FLJ20 1 54 

POU domain class 2 associating POU2AF1 
factor 1 

core-binding factor runt domain alpha CBFA2T3 
subunit 2 tr anslocated to 3 



4 37780_at 

5 36985_at 

6 3857S_at 

7 35614_at 

8 32224_at 

9 32730_at 

10 36937_s__at 

11 36008_at 

12 41200 at 



piccolo presynaptic cytomatrix PCLO 
protein 

isopentenyl-diphosphate delta IDI1 
isomerase 

tumor necrosis factor receptor TNFRSF7 
superfamily member 7 

transcription factor-like 5 basic helix- TCFL5 
loop-helix 

KIAA0769 gene product KIAA0769 
KIAA1750 protein 

PDZ and LIM domain 1 elfin PDLBvll 



protein tyrosine phosphatase type IVA PTP4A3 
member 3 

CD36 antigen collagen type I receptor CD36L1 
tlirombospondrn receptor like 1 



AF070644 
Z49194 

AB010419 

AB011131 

X17025 

M63928 

AB012124 

AB018312 
AL080059 
U90878 
AF041434 

Z22555 



Above/ 
Below 
Mean 

Above 

Above 
Above 

Above 
Above 
Above 

Above 

Above 
Above 
Below 
Above 

Above 



-60- 



BNSDOCID: <WO 03083 1 40A2J_> 



WO 03/083140 



PCT/US03/08486 



13 33690_at 

14 755_at 

15 41097_at 

16 160029_at 

17 3448 l_at 

18 41498_at 

19 37280_at 

20 1647_at 

21 37724_at 

22 3798 l_at 

23 37326_at 

24 37344_at 

25 38666_at 

26 39039_s_at 

27 34819_at 

28 40729_s_at 

29 34224_at 

30 39S27_at 

31 32157 at 



DKFZp434A202 from clone AL0S0190 
DKFZp434A202 

inositol 1 4 5-triphosphate receptor ITPR1 D26070 
type 1 

telomeric repeat binding factor 2 TERF2 AF002999 

protein kinase C beta 1 PRKCB 1 X07 1 09 

vav proto-oncogene Vav AF030227 

KIAA09 1 1 protein KIAA09 1 1 AB0207 1 8 

MAD mothers against MADH 1 U599 1 2 

decapentaplegic Drosophila homolog 

1 

IQ motif containing GTPase IQGAP2 U5 1 903 

activating protein 2 

v-myc avian myelocytomatosis viral MYC V00568 
oncogene homolog 

drebrin 1 DBN1 U00802 

proteolipid protein 2 colonic PLP2 U93305 
epitlielium-enriched 

major histocompatibility complex HLA-DMA X62744 
class II DM alpha 

pleckstrin homology Sec7 and PSCD 1 M85 1 69 

coiled/coil domains 1 cytohesin 1 

CGI-76 protein 

CD 164 antigen sialomucin 

nuclear factor of kappa light NFKBIL1 Y14768 

polypeptide gene enhancer in B-cells 
inhibitor-like 1 

fatty acid desaturase 3 FADS3 AC004770 

hypothetical protein FLJ20500 AA522530 

protein phosphatase 1 catalytic PPP1CA S57501 

subunit alpha isoform 



LOC51632 AI557497 
CD164 D14043 



32 34183_at DKFZP434C171 protein 



33 39329_at 

34 38124_at 

35 33304_at 

36 41295_at 

37 40745_at 

38 38906_at 

39 263_g_at 

40 41609 at 



actinin alpha 1 

midkine neurite growth-promoting 
factor 2 

interferon stimulated gene 20kD 
GTT1 protein 

adaptor-related protein complex 1 
beta 1 subunit 



DKFZP434C17 AL080169 
1 



ACTN1 
MDK 

ISG20 
GTT1 
AP1B1 

SPTA1 



spectrin alpha erythrocytic 1 
elliptocytosis 2 

S-adenosylmethionine decarboxylase AMD1 
1 

major histocompatibility complex HLA-DMB 
class II DM beta 



41 39045_at hypothetical protein FLJ21432 



FLJ21432 



X15804 
X55110 

U88964 

AL041780 

L13939 

M61S77 
M21154 
U15085 

W26655 



Above 

Above 

Above 
Above 
Above 
Above 
Above 

Below 

Below 

Above 
Below 

Above 

Below 

Below 
Below 
Above 

Above 
Below 
Below 

Below 

Below 
Above 

Above 
Below 
Above 

Above 
Below 
Above 

Below 



-61- 



BNSDOCID: <WO_ 



_03083140A2_I_> 



WO 03/083140 



PCT/US03/0S486 



42 


39421_at 


runt-related transcription factor 1 


RUNX1 


D43969 


Above 






acute myeloid leukemia 1 amll 




















43 


34210_at 


CDW52 antigen CAMPATH-1 


CDW52 


N90S66 


Above 






antigen 








44 


37276_at 


IQ motif containing GTPase 


IQGAP2 


U51903 


Below 






activating ■nrotein 2 








45 


38763_at 


L-iditol-2 dehydrogenase gene 




L29254 


Below 


46 


40960_at 


UDP-Gal betaGlcNAc beta 1 4- 


B4GALT1 


D29805 


Below 






galactosyltransferase polypeptide 1 








47 


1127_at 


ribosomal protein S6 kinase 90kD 


RPS6KA1 


L07597 


Below 






nr>1vT><*TYf"iHf* 1 








A ° 


3 /jDy_at 


JsJLAAU 1 Uz gene product 


\r T a A A 1 AO 

KlAAUIUz 


D14655 


Below 


AQ 


jt5yoo_at 


oriJ-aomam Duiamg protem :> r> 1 iv- 




ABUUdU4/ 


Below 














50 


39135_at 


KIAA0767 protein 


KIAA0767 


AB018310 


Below 


51 


36128_at 


transmembrane trafficking protein 


TMP21 


L40397 


Below 


52 


1158_s_at 


calmodulin 3 phosphorylase kinase 


CALM3 


J04046 


Above 






delta 








53 


34782_at 


jumonji mouse homolog 


JMJ 


AL021938 


Below 


54 


37S93_at 


protein tyrosine phosphatase non- 


PTPN2 


AI828880 


Below 






receptor type 2 








55 


39758JTat 


Lysosomal-associated membrane 


LAMP1 


J04182 


Below 






protein 1 








56 


35151_at 


tumor suppressor deleted in oral 


DOC-IR 


AF089814 


Below 














57 


38096_f_at 


major histocompatibility complex 


HLA-DPB 1 


M83664 


Above 






class II DP beta 1 








c o 
JO 


A f\ A /Z"~l „ 4- 

40467_at 


succinate dehydrogenase complex 


SDHD 


AB006202 


Below 






subunit D integral membrane protein 








<Q 

jy 


5)) 1 l±_2X 


c>iUU calcium-bmding protem A13 


Ci t r\r\ a i O 

S100A13 


A T C A 1 O Aft 

AI541308 


Below 


60 


41812_s_at 


~wy T A A AAA/' , • 

KIAA0906 protem 


KIAA0906 


AB020713 


Below 


61 


34336_at 


lysyl-tRNA synthetase 


KARS 


D32053 


Below 


62 


38336_at 


K1AA1013 protein 


KIAA1013 


AB023230 


Below 


63 


32253_at 


argimne-glutamic acid dipeptide RE 


RERE 


AB007927 


Below 






repeats 








64 


3573 l_at 


integrin alpha 4 antigen CD49D alpha 


ITGA4 


X16983 


Below 






4 subunit of VLA-4 receptor 








CD 


4Uoyo_at 


C-type calcium dependent 


CL.bCi>.r2 


X96719 


Below 






carnf)nvHr?ltf*-i*ppr»OTii'Kr\Ti Hnmain 












lectin superfamiiy member 2 












activation-induced 








66 


840_at 


zinc finger protein 220 


ZNF220 


U47742 


Above 


67 


41171_at 


proteasome prosome macropain 


PSME2 


D4524S 


Above 






activator subunit 2 PA28 beta 








68 


34877_at 


Janus kinase 1 a protein tyrosine 


JAK1 


AL039831 


Above 






kinase 








69 


37190_at 


WAS protein family member 1 


WASF1 


D87459 


Below 


70 


31690_at 


Glutamate dehydrogenase-2 


GLUD2 


U08997' 


Below 



-62- 



BNSDOCID: <WO 030831 40A2_L> 



WO 03/083140 



PCT7US03/08486 



71 


4096 l_at 


SWI/SNF related matrix associated 


SMARCA2 


X728S9 


Below 






actin dependent regulator of 












rhrnmatin snbfanrrilv a member 2 








72 


38149_at 


KIAA0053 gene product 


KIAA0053 


D29642 


Above 


73 


2061_at 


integrin alpha 4 antigen CD49D alpha 


ITGA4 


L12002 


Below 






4 subunit of VLA-4 receptor 








74 


2012_s_at 


protein kinase DNA-activated 


PRKDC 


U34994 


Below 






catalytic polypeptide 








75 


36878 f at 


major histocompatibility complex 


HLA-DQB1 


M60028 


Above 






class II DQ beta 1 








76 


3482 l_at 


DKFZP586D0623 protein 


DKFZP586D06 AL050197 


Below 


77 


36980 at 


proline-rich protein with nuclear 


B4-2 


U03105 


Below 






targeting signal 








78 


853 at 


nuclear factor erythroid-derived 2 like NFE2L2 


S74017 


Below 






z 

rasnase 1 annn to sis -related cvsteme 


CASP1 


U13697 


Below 






rvm tease interleiilvin 1 beta convertase 








80 


32572_at 


ubiquitin specific protease 9 X 




X98296 


Below 






chromosome Drosophila fat facets 












related 








81 


387_at 


cyclin-dependent kinase 9 CDC2- 


CDK9 


X80230 


Below 






related kinase 








82 


35300_at 


glutamyl-prolyl-tRNA synthetase 


EPRS 


X54326 


Below 


83 


36155_at 


KIAA0275 gene product 


KIAA0275 


D87465 


Below 


84 


37625_at 


Interferon regulatory factor 4 


[ERF4 


U52682 


Below 


85 


35763_at 


KIAA0540 protein 


KIAA0540 


AB011112 


Below 


86 


39077_at 


DR1 -associated protein 1 negative 


DRAP1 


U41843 


Below 






cofactor 2 alpha 








87 


40132_g_at 


FoUistatin-like 1 


FSTL1 


D89937 


Below 


88 


32615_at 


aspartyl-tRNA synthetase 


DARS 


J05032 


Below 


89 


38357_at 


Homo sapiens mRNA cDNA 




AL049321 


Above 






DKFZp564D156 from clone 












DKFZp564D156 








90 


34817_s_at 


ataxin 2 related protein 


A2LP 


U70671 


Above 


91 


40856__at 


serine or cysteine proteinase inhibitor 


SERPINF1 


U29953 


Below 






clade F alpha-2 antiplasrnin pigment 












epithelium derived factor member 1 








92 


39784_at 


eukaryotic translation initiation factor EEF2S1 


U26032 


Below 






2 subunit 1 alpha 35kD 








93 


37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


Below 


94 


40839__at 


ubiquitin-like 3 


UBL3 


AL080177 


Below 


95 


34S32__s_at 


KIAA0763 gene product 


KIAA0763 


AB018306 


Below 


96 


33244_at 


chimerin chimaerin 2 


CHN2 


U07223 


Below 


97 


31516jf_at 


basic transcription factor 3 like 1 


BTF3L1 


M90354 


Below 


98 


35266_at 


bladder cancer associated protein 


BLCAP 


AL049288 


Above 
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99 


253 g__at 


(clone GPCR W) G protein-linked 




L42324 


Below 








receptor gene (GPCR) gene 








100 


35227, 


_at 


retinoblastoma-binding protein 8 


RBBPS 


U72066 


Below 


101 


41073. 




G protein-coupled receptor 49 


GPR49 


AI743745 


Below 


102 


3S084_ 


_at 


chromobox homolog 3 Drosophila 

1— TT3 1 omninnfi 

XXX 1 t^dilllllC* 


CBX3 


AI797801 


Below 


103 


39025, 


at 


6.2 kd protein 


LOC54543 


AI557912 


Below 


104 


32085_ 


_ at 


KIAA0981 protein 


KIAA0981 


AB023198 


Above 


105 


38902_ 




Activating transcription factor 2 


ATF2 


X15875 


Below 



3. T-statistics 

T-statistics is a classical feature selection approach. The t-statistics of a gene is 
5 defined as T = - ^ 2 |/sqrt(ai 2 /ni + cr 2 2 /n 2 ), where jiti is the mean expression of that 
gene in the i th class, <5\ is the variance of that gene in the i th class and w is the size of 
the i th class. This formula assigns higher value to a gene that has larger mean 
difference between two classes and has smaller variance within both classes. For 
BCR-ABL, hyperdiploid >50, MLL, Novel, and TEL-AML1 the top ranked 40 genes 

10 are listed in Tables 16, 18, 19, 20, and 22, whereas for E2A-PBX1 and T-ALL only 
the top 30 and 3 1 genes are shown. Additional genes that may be used in expression 
profiles to assign subjects to a leukemia risk group are shown in Tables 54-60. The 
genes in Tables 54-60 were selected on the basis of having a T-statistic value greater 
than the T-statistic vlaue for the gene when examined as a disciminator in 999 of 1000 

15 permutations of the data set (pO.OOl; this statistical test is described elsewhere 
herein). Of these genes, only those having a T-statistic absolute values equal to or 
greater than S (representing a nominal p value of -O.0001) are shown in Tables 54- 
50. 

Generally, using the top 20-40 genes did not result in significant changes to 
20 subtype prediction accuracy. Accordingly, the top 20 genes were used for subtype 
prediction, unless noted otherwise. 
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Table 16. Genes Selected by T statistics for BCR-ABL 














Above/ 




Affymetrix 




Gene 


Reference 


T-stat 


Below 




number 


Gene Name 


Symbol 


number 


value 


Mean 


i 
i 


^?^io at 


tumor necrosis factor ligand 


TNFSF4 


AL022310 


12.0346 


Above 






superfamily member 4 tax- 














trar»<?rrintirmallv activated 














glycoprotein 1 34kD 








Below 


2 


36194 at 


low density lipoprotein-related 


LRPAP1 


M63959 


-11.3077 






protein-associated protein 1 alpha- 














s-macro globulin receptor- 














associated protein 1 








Above 


3 


1211_s_at 


CASP2 and RIPK1 domain 


CRADD 


U843S8 


10.6627 






containing adaptor with death 














domain 








Above 


4 


37397_at 


Homo sapiens platelet/endothelial 






10.2460 






cell adhesion molecule- 1 














(PECAM-1) gene, exon 16 and 














complete cds. 






10.0540 


Above 


5 


330_s_at 


tubulin, alpha 1, isoform44 


TUBA1 


HG2259- 














6 


33774_at 


caspase 8 apoptosis-related 


CASP8 


X98172 


9.9147 


Above 






cysteine protease 






-9.7639 


Below 


7 


202 at 


heat shock transcription factor 2 


HSF2 


M65217 


Q 




«0 1 1C^ArAI /R ar 1 -artivatPfi kinase 


PAK1 


U24152 


9.6562 


Above 






1 yeast Ste20-related 










0 


^Ofi01 at 


-containing t>rotein SH^GLBl 


SH3GLB1 


AB007960 


9.5307 


Above 


i n 


zuhd s__ax 


Vi<»TVinn/*»iptir* f*p11 IriTiaQP 


HCK 


Ml 6592 


-9.3898 


Below 




36591 at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


9.3382 


Above 


1 9 


ijou at 


nrntein tvrosine tjhosohatase non- 


PTPN9 


M83738 


-9.2414 


Below 






receptor type 9 












^5991 at 


Sm nrotein F 


LSM6 


AA9 17945 


9.0298 


Above 


1 J. 


/II 97^ at 


bindine nrotein 12- 


FRAP1 


AL046940 


8.9732 


Above 






rapamycin associated protein 1 












^S970 *y at 


M-phase phosphoprotein 9 


MPHOSPH9 


N23137 


8.6474 


Above 


16 


38636_at 


immunoglobulin superfamily 


ISLR 


AB003184 


6.4291 


Above 






containing leucine-rich repeat 










17 


36683_at 


matrix Gla protein 


MGP 


AI953789 


-8.3872 


Below 


18 


39070_at 


singed Drosophila like sea urchin 


SNL 


U03057 


8.2583 


Above 






fascin homolog like 










19 


40798_s_at 


a disintegrin and metalloproteinase ADAM 10 


Z48579 


8.2283 


Above 






domain 10 








Above 


20 


41649__at 


F0XJ2 forkhead factor 


LOC55810 


AF038177 


8.2275 


21 


38966_at 


glycoprotein synaptic 2 


GPSN2 


AF03895S 


8.2080 


Above 


22 


34759_at 


Human hbc647 mRNA sequence 




U68494 


8.1863 


Above 


23 


1434_at 


phosphatase and tensin homolog 


PTEN 


U92436 


8.1671 


Above 



cancers 1 



-65- 



BNSDOCID: <WO 030831 40A2J_> 



WO 03/083140 



PCT/US03/0S486 



24 


40167_s_at 


CS box-containing WD protein 


LOC55884 


A 1""»/"V'** ft 1 0^7 

AF038187 




Above 


25 


40264_g_at 


zinc finger protein-like 1 


ZFPL1 


AF001891 


8.13o4 


Above 


26 


36129_at 


isJLA-/\u^y / gene pioauci 


KTAA0397 


AB007857 


8.0041 


Above 


27 


551 at 


E 1 A binding protein p3 00 


EP300 


U01877 


-7.7578 


Below 


Zo 




n^nfrr»cr»m^l r^rntPITI 1 


CEP1 


AF083322 


-7.7431 


Below 


29 


41137_at 


myosin phosphatase target subunit 


MYPT2 


AB007972 


-7.7301 


Below 


30 


3906S_at 


2 

protein phosphatase 2 regulatory 


PPP2R5D 


L76702 


-7.6161 


Below 






subunit B B56 delta isoform 










31 


38160_at 


lymphocyte antigen 75 


LY75 


AF011333 


7.5830 


Above 


32 


34314 at 


ribonucleotide reductase Ml 


RRM1 


X59543 


7.5778 


Above 






polypeptide 






7.4662 


Above 


33 


39519_at 


KIAA0692 protein 


KIAA0692 


AB014592 


34 


32788_at 


RAN binding protein 2 


RANBP2 


D42063 


7.4114 


Above 


35 


34882_at 


nucleolar protein KXE/D repeat 


NOP56 


Y12065 


7.3622 


Above 


36 


2064_g_at 


excision repair cross- 


ERCC5 


L20046 


/.3:>9/ 


Above 




complementing rodent repair 














deficiency complementation group 










37 


41836_at 


5 

protein with polyglutamine repeat 


T-? TJPP OT9 1 ^ 
HJvr i\\J IZIj 




7.3350 


Above 






calcium ca2 homeostasis 


-21 












endoplasmic reticulum protein 








Above 


38 


1563_s_at 


tumor necrosis factor receptor 


TNFRSF1A 


M5&2ao 








superfamily member 1 A 










39 


37047_at 


Niemann-Pick disease type CI 


NPC1 


AF002020 


7.2357 


Above 


40 


32724__at 


phytanoyl-CoA hydroxylase 


PHYH 


AF023462 


-7.2252 


Below 



Refsum disease 



Table 17. Genes Selected by T statistics for E2A-PBX1 




Affymetrix 


Gene Name 


Gene 


Reference 


T-stat 


Above/ 




number 




Symbol 


number 


value 


Below 
Mean 


1 


32063_at 


pre-B-cell leukemia transcription 


PBX1 


M86546 


126.7442 


Above 






factor 1 








Above 


2 


33355_at 


Homo sapiens cDNA FLJ12900 
fis clone NT2RP2004321 (by 
CELERA search of target 


PBX1 


AL049381 


36.6116 






sequence = PBX1) 






30.7577 


Above 


3 


40454_at 


FAT tumor suppressor Drosophila 


FAT 


X87241 






homolog 








Above 


4 


717_at 


GS3955 protein 


GS3955 


D87119 


23.7813 


5 


39070_at 


singed Drosophila like sea urchin 
fascin homolog like 


SNL 


U03057 


-22.8956 


Below 


6 


33641_g_at 


nuclear factor of kappa light 
polypeptide gene enhancer in B- 


NFKBIL1 


Y14768 


-20.4637 


Below 






cells inhibitor-like 1 








Below 


7 


36536_at 


schwamromin interacting protein 1 SCHIP-1 


AP070614 


-20.1554 


8 


854_at 


B lymphoid tyrosine kinase 


BLK 


S76617 


19.6467 


Above 


9 


37625_at 


interferon regulatory factor 4 


IRF4 


U52682 


18.8419 


Above 
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10 


39614_at 


KIAA0802 protein 


KIAA0802 


AB018345 


17.8214 


Above 


11 


37099_at 


aracliidonate 5-lipoxygenase- 


ALOX5AP 


AI806222 


-17.7944 


Below 






activating protein 










12 


38994_at 


STAT induced STAT inliibitor-2 


STATI2 


AF037989 


-17.6553 


Below 


13 


37641_at 


Human gene for hepatitis C- 




D28915 


-17.3074 


Below 






associated microtubular aggregate 














nrnfpin r-»44 pvnn Q nnH PPiTTirilptP 
Ol LIIC ill JJ^"; CA.U11 js ill 1U \^yJl liyjX\~> iv^ 










14 


40113_at 


cds. 

GS3955 protein 


GS3955 


DS7119 


16.728S 


Above 


15 


2031_s_at 


cyclin-dependent kinase inhibitor 


CDKN1A 


U03106 


-14.9826 


Below 






lAp21 Cipl 










i ^ 
1 o 


r V\C\ c nt 


tiiVinlin alrVha 1 i Qntpirm 44 

luUUllll, al^Jllu, Xj ISUlUllli "1 * 


TUBA1 


HG2259- 


-14.8016 


Below 








HT2348 






1 1 


^Q^j.n at 


Vninti ncr tin intpraptino Tirntpiri-I - 

IX lUlllllg L1JJL UlLC/lcll^lllig, 1 


KIAA0655 


AB014555 


14.7180 


Above 






reiaiea 










1 s 


^R^10 at 

JOJIU ctl 


TTr»mo <;anipn^ mRN A cDNA 

i .1 Uii iKt oduxv/iio uuiM^n vjyxi^i 




AL049435 


-14.4522 


Below 
















19 


268_at 


Homo sapiens platelet/endothelial 


PECAM 


L34657 


-13.7540 


Below 






cell adhesion molecule- 1 














(PECAM-1) gene, exon 16 and 














complete cds. 








Above 


9ft 


9069 at 

^UO^ <ll 


insulin-like growth factor binding 


IGFBP7 


L19182 


13.6403 






protein 7 






13.5099 


Above 


21 


37893_at 


protein tyrosine phosphatase non- 


PTPN2 


AI828880 






receptor type 2 










22 


38580_at 


guanine nucleotide binding protein GNAQ 


U43083 


-12.8525 


Below 






G protein q polypeptide 










23 


40049_at 


death-associated protein kinase 1 


DAPK1 


X76104 


-12.3837 


Below 


24 


38393_at 


KIAA0247 gene product 


KIAA0247 


D87434 


12.3436 


Above 


25 


39379_at 


Homo sapiens niRNA cDNA 




AL049397 


12.2102 


Above 






DKFZp586C1019 










26 


430_at 


nucleoside phosphorylase 


NP 


AUU id 1 




Above 


27 


37975_at 


cytochrome b-245 beta 


CYBB 


yozioi 1 


19 ft7A^ 


Below 






polypeptide chronic 














granulomatous disease 










28 


34862_at 


CGI-49 protein 


LOC51097 


AA005018 


12.0264 


Above 


?o 


^Q7^6 a at 


X-box binding protein 1 


XBP1 


Z93930 


-11.9796 


Below 


jU 


/ al 


arachidonate 5-lipoxygenase 


ALOX5 


J03600 


-11.9492 


Below 


31 


37304 at 


chromobox homolog 1 Drosophila CBX1 


U35451 


11.9422 


Above 






HP1 beta 










32 


1287_at 


ADP-ribosyltransferase NAD poly ADPRT 


J03473 


11.9051 


Above 






ADP-ribose polymerase 










33 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


11.7327 


Above 




jyo_s_at 


colony stimulating factor 3 


CSF3R 


M59820 


-11.6814 


Below 






receptor granulocyte 










35 


37493_at 


colony stimulating factor 2 


CSF2RB 




1 1 £690 

1 1 .OOiiU 


A V^rvvp 

AUU V t/ 






receptor beta low-affinity 














granulocyte-macrophage 








Above 


36 


36452_at 


synaptopodin 


KIAA1029 


AB028952 


11.4021 


37 


1081_at 


orruthine decarboxylase 1 


ODC1 


M33764 


11.2865 


Above 
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38 


1563_s_at 


tumor necrosis factor receptor 


TNFRSF1A 


M58286 


-11.1361 


Below 






superfamily member 1A 










39 


39069_at 


AE-binding protein 1 


AEBP1 


AF053944 


11.0984 


Above 


4U 


3oZU3_at 


ornithine decarboxylase 1 


ODC1 


X16277 


10.9475 


Above 






Table 18. Genes Selected by T statistics for Hyperdiploid > 50 




— — — r- 


Affvmetrix 


Gene Name 


Gene 


XVclCi cliue 


T-stat 


Above/ 




number 




Symbol 


number 


value 


Below 














Mean 


1 


36620_at 


superoxide dismutase 1 soluble 


O/^VT^ 1 

oUlJl 


AUz j 1 1 


9.1574 


Above 






amyotrophic lateral sclerosis 1 














adult 










2 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


-6.9008 


Below 


3 


37543_at 


Rac/Cdc42 guanine exchange 


ARHGEF6 


D25304 


6.8366 


Above 






factor GEF 6 










4 


41470_at 


prominin mouse like 1 


T>T> /"V\ A T 1 


A T?A0'7 /, >nQ 

ArUZ /zUo 


6.7290 


Above 


5 


31492_at 


muscle specific gene 


M9 


AB019392 


-6.6885 


Below 


6 


38968_at 


SH 3 -domain binding protein 5 


SH3BP5 


AB005047 


6.4051 


Above 






BTK-associated 










7 


1915_s_at 


v-fos FBJ murine osteosarcoma 


FOS 


V01512 


6.4008 


Above 






viral oncogene homolog 








Above 


8 


37677_at 


phosphoglycerate kinase 1 


PGK1 


V00572 


6.2865 


9 


39867_at 


Tu translation elongation factor 


TUFM 


S75463 


-6.2299 


Below 






mitochondrial 










10 


36795_at 


prosaposin variant Gaucher 


PSAP 


J03077 


6.1812 


Above 






disease and variant metachromatic 














IpiiVnHvQtrrirVhv 

LLi\AJUjr o U. KJ^Jlkj 






-6.0877 


Below 


1 1 


40875_s_at 


small nuclear ribonucleoprotein 


SNRP70 


X06815 






/UKU pUiypCjJLlvlC XvlN JL <l l ill 








Above 


12 


306_s_at 


high-mobility group nonhistone 


HMG14 


J02621 


6.0804 






chromosomal protein 14 










13 


41724_at 


accessory protems BAP31/BAP29 


DXS1J5 /h, 


VO 1 1 AO 


6.0244 


Above 


14 


39168_at 


Ac-like transposable element 


ALTE 


AB018328 


5.9336 


Above 


15 


955_at 


calmodulin type I 


CALM1 


HG1862- 


5.S650 


Above 








HT1897 






16 


38604_at 


neuropeptide Y 


NPY 


AI198311 


5.8313 


Above 


17 


39147_g_jit 


alpha thalassemia/mental 


ATRX 


U72936 


J.OIOI 


Above 




retardation syndrome X-linked 














RAD54 S. cerevisiae homolog 










18 


39069__at 


AE-binding protein 1 


AEBP1 


AF053944 


-5.6901 


Below 


19 


37014_at 


myxovirus influenza resistance 1 


MX1 


M33882 


5.66S8 


Above 






homolog of murine interferon- 














inducible protein p78 






5.6605 


Above 


20 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 
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21 1488_at 

22 32553_at 

23 36169_at 

24 1817_at 

25 578 at 



protein tyrosine phosphatase 
receptor type K 



PTPRK 



MAZ 



26 
27 

28 
29 

30 
31 

32 
33 
34 

35 
36 

37 
38 



1556_at 
40998_at 

37294_at 
1447_at 

35940 at 
33307__at 

1081_at 

34336_at 

41143_at 

3225 l_at 
35298_at 

38649_at 
36629 at 



MYC-associated zinc finger 
protein puiine-binding 
transcription factor 

NADH dehydrogenase ubiquinone NDUFA1 

1 alpha subcomplex 1 7.5kD 

MWFE 

prefoldin 5 PFDN5 

Human recombination acitivating RAG2 
protein (RAG2) gene, last exon 

RN A binding motif protein 5 RJBM5 

trinucleotide repeat containing 1 1 TNRC 1 1 

THR-associated protein 230 kDa 

subunit 

B-cell translocation gene 1 anti- BTG1 
proliferative 

proteasome prosome macropain PSMB 1 
subunit beta type 1 

POU domain class 4 transcription POU4F1 
factor 1 

kraken-like BK126B4.1 



L77886 
M94046 

N47307 

D89667 

M94633 

U23946 
AF071309 

X61123 
D00761 

X64624 
AL022316 



39 39721_at 

40 2094 s at 



ornithine decarboxylase 1 ODC1 M33764 

lysyl-tRNA syntiietase KARS D32053 

Human calmodulin (CALM1) CALM1 U12022 
gene, exons 2,3,4,5 and 6, and 
complete cds 

hypothetical protein FLJ21 174 FLJ21 174 AA149307 

eukaryotic translation initiation EIF3 S7 U545 5 8 

factor 3 subunit 7 zeta 66/67kD 
KIAA0970 protein 

glucocorticoid-induced leucine 
zipper 

eplirin-Bl EFNB1 U09303 

v-fos FBJ murine osteosarcoma FOS K00650 
viral oncogene homolog 



KIAA0970 AB023187 
GILZ AI635895 



-5.5877 
-5.5000 

5.4376 

-5.4110 

-5.4026 

-5.3032 
5.2349 

-5.1877 
5.1699 

5.1200 
-5.0984 

-5.0822 
-5.0692 
5.0543 

5.0373 
-4.9499 

-4.9228 
4.8061 

4.7968 
4.7446 



Below 
Below 

Above 

Below 

Below 

Below 
Above 

Below 
Above 

Above 
Below 

Below 
Below 
Above 

Above 
Below 

Below 
Above 

Above 
Above 



Table 19. Genes Selected by T statistics for MIX 



Affymetrix 
number 



Gene Name 



Gene 
Symbol 



Reference 
number 



T-stat 
value 



1 307_at arachidonate 5-lipoxygenase 

2 37280_at MAD mothers against 

decapentaplegic Drosophila 
homolog 1 

3 1520_s_at interleukin 1 beta 

4 36908_at Human macrophage mannose 

receptor (MRC1) gene, exon 30. 



ALOX5 
MADH1 

IL1B 
MRC1 



J03600 
U59912 

X04500 
M93221 



-16.8244 
-15.4460 

-13.6764 
-11.8629 



Above/ 
Below 
Mean 



Below 
Below 

Below 
Below 



-69- 



BNSDOCID: <WO 03083 140A2_I_> 
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5 


33412_at 


LGALS1 Lectin, galactoside- 
"hinrlina Qnlnhle 1 ( fralectin 1^ 


LGALS1 


AI535946 


11.0223 


Above 


6 


2062_at 


insulin-like growth factor binding 


IGFBP7 


L19182 


10.4318 


Above 






protein 7 






-10.1815 


Below 


7 


35940_at 


POU domain class 4 transcription 
factor 1 


POU4F1 


X64624 


8 


39721_at 


ephrin-Bl 


EFNB1 


U09303 


-9.6158 


Below 


9 


39402_at 


interleukin 1 beta 


IL1B 


M15330 


-9.5998 


Below 


10 


1737_s_at 


insulin-like growth factor-binding 






-9.4119 


Below 






protein 4 








Below 


11 


37413_at 


dipeptidase 1 renal 


DPEP1 


J05257 


-9.4101 


12 


40519_at 


protein tyrosine phosphatase 


PTPRC 


Y00638 


9.3163 


Above 






receptor type C 






-9.2257 


Below 


13 


1971_g_at 


fragile histidine triad gene 


FHIT 


U46922 


14 


1983_at 


cyclin D2 


CCND2 


X68452 


-9.2213 


Below 


1 s 


18R6Q at 


KIAA1069 orotein 


KIAA1069 


AB028992 


-9.1951 


Below 


16 


40520_g_at 


protein tyrosine phosphatase 


PTPRC 


Y00638 


9.1099 


Above 






receptor type C 






9.0435 


Above 


17 


1718_at 


actin related protein 2/3 complex 
subunit 2 34 kD 


ARPC2 


U50523 


18 


34237__at 


HBS1 S. cerevisiae like 


HBS1L 


AB028961 


-8.8208 


Below 


19 


1726_at 


DNA polymerase, epsilon, 
catalytic subunit 




HG919- 
HT919 


-8.4664 


Below 


20 


36643_at 


discoidin domain receptor family 
member 1 


DDR1 


L20817 


-8.4627 


Below 


21 


1325_at 


MAD mothers against 
decapentaplegic Drosophila 


A/TAHHI 
iVLAJJxl 1 




-O.J f \j£* 


Below 






homolog 1 








Above 


22 


39379_at 


Homo sapiens mRNA cDNA 
DKFZp586C1019 




AL049397 


8.2974 


23 


36536_at 


schwannomin interacting protein 1 SCHIP-1 


AF070614 


-8.1177 


Below 


24 


564_at 


guanine nucleotide binding protein GNA1 1 


M69013 


-8.1107 


Below 






G protein alpha 1 1 Gq class 










25 


39705_at 


KIAA0700 protein 


KJAA0700 


AB014600 


-7.9334 


Below 


26 


36105_at 


Human nonspecific crossreacting 
antigen mRNA, complete cds. 


NCA 


Ml 8728 


-7.6911 


Below 


27 


174_s_at 


intersectin 2 


rrsN2 


U61167 


7.5752 


Above 


28 


39114_at 


decidual protein induced by 


DEPP 


AB022718 


-7.4767 


Below 






progesterone 






/ .J7JZ 


J~WJ\J vc 


29 


40436_g_at 


solute carrier family 25 


SLC25A6 






mitochondrial carrier adenine 
nucleotide translocator member 6 










30 


794_at 


protein tyrosine phosphatase non- 
receptor type 6 


PTPN6 


X62055 


7.2192 


Above 


O 1 


ioUi 2_at 


KIAA0736 gene product 


KIAA0736 


r\x)u i / y 


-7.0718 


Below 


32 


40518_at 


protein tyrosine phosphatase 


PTPRC 


Y00062 


6.9829 


Above 






receptor type C 






-6.9118 


Below 


33 


41762_at 


TIA1 cytotoxic granule-associated TIAL1 


D64015 






RNA-binding protein-like 1 











-70- 
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34 13S9_at membrane metallo-endopeptidase 

neutral endopeptidase 
enkephalinase CALLA CD 10 

35 39967_at leucine zipper down-regulated in 

cancer 1 

36 188_at ephrin-Bl 

37 160033_s_at X-ray repair complementing 

defective repair in Chinese 
hamster cells 1 

38 409 1 3_at ATPase Ca transporting plasma 

membrane 4 

39 37398_at platelet/endothelial cell adhesion 

molecule CD31 antigen 



40 1488 at 



protein tyrosine phosphatase 
receptor type K 



MME 

LDOC1 

EFNB1 
XRCC1 

ATP2B4 
PECAM1 

PTPRK 



J03779 

AB019527 

U09303 
NM 006297 



L77886 



-6.7734 Below 

-6.7415 Below 

-6.5964 Below 

-6.5936 Below 



W28589 -6.5774 Below 

AA100961 -6.5675 Below 



-6.55S4 Below 



Table 20. Genes Selected by T statistics for Novel Risk Group 





Affymetrix 


Gene Name 


Gene 


JVcltl CllCC 


T-stat 


Above/ 




number 




Symbol 


number 


value 


Below 
Mean 


1 


41734_at 


KIAA0870 protein 


K1AA0870 


AB020677 


-40.5168 


Below 


2 


5 1 oV-i_at 


protein tyrosine phosphatase 


PTPRM 


X58288 


33.4654 


Above 






receptor type M 






24.7557 


Above 


3 


995_g_at 


protein tyrosine phosphatase 


PTPRM 


X58288 






receptor type M 






14.0491 


Above 


A 


340 / o_ai 


KIAA1099 protein 


KIAA1099 


AB029022 


D 


d / yyjo ai 


guanine nucleotide binding protein GNG11 
11 

carbohydrate chondroitin 6/keratan CHST2 


U31384 


11.4548 


Above 


6 


37960_at 


AB014679 


10.9971 


Above 






sulfotransferase 2 










7 


33410_at 


integrin alpha 6 


ITGA6 


S66213 


10.0370 


Above 


8 


40585_at 


adenylate cyclase 7 


ADCY7 


D25538 


-9.5897 


Below 


9 


33284_at 


myeloperoxidase 


MPO 


Ml 9507 


-9.4724 


Below 


10 


41159_at 


clathrin heavy polypeptide He 


CLTC 


D21260 


9.4489 


Above 


11 


36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


-9.1387 


Below 


12 


37712_g_at 


MADS box transcription enhancer 
factor 2 polypeptide C myocyte 


MEF2C 


S57212 


-9.1225 


Below 






enhancer factor 2C 








Below 


13 


3S576_at 


H2B histone family member B 


H2BFB 


AJ223353 


-9.0869 


14 


38408_at 


transmembrane 4 superfamily 
member 2 


TM4SF2 


L10373 


-8.7026 


Below 


15 


33907_at 


eukaryotic translation initiation 
factor 4 gamma 3 


EIF4G3 


AF0 12072 


-8.3540 


Below 


16 


41273_at 


FK506 binding protein 12- 
rapamycin associated protein 1 


FRAP1 


AL046940 


-8.3212 


Below 


17 


402_s_at 


intercellular adhesion molecule 3 


ICAM3 


X69819 


-7.9741 


Below 


18 


35112_at 


regulator of G-protein signalling 9 


RGS9 


AF071476 


7.8348 


Above 


19 


34850_at 


ubiquitin-conjugating enzyme E2E UBE2E3 


ABO 17644 


7.8197 


Above 






3 homologous to yeast UBC4/5 










20 


37030_at 


KIAA0887 protein 


KIAA0887 


AB020694 


-7.6343 


Below 
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BNSDOCID: <WO. 



.03083 140A2_I_> 



WO 03/083140 

21 36322_at 

22 39509_at 

23 4009 l_at 

24 37280_at 

25 1325_at 

26 831_at 

27 37600_at 

28 41266_at 

29 36958_at 

30 36564_at 

31 32174_at 

32 619_s_at 

33 40749 at 



34 31894_at 

35 32319 at 



36 38259_at 

37 35629_at 

38 38700_at 

39 37397 at 



40 41127 at 



BCL6 
MADH1 

MADH1 

DDX10 

ECM1 
ITGA6 
ZYX 



fucosyltransferase 7 alpha 1 3 FUT7 
fucosyltransferase 

Homo sapiens cDNA FLJ22071 
B-cell CLL/lymphoma 6 zinc 
finger protein 5 1 
MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

DEAD/H Asp-Glu-Ala-Asp/His 
box polypeptide 10 RNA helicase 

extracellular matrix protein 1 
integrin alpha 6 
zyxin 

Human DNA sequence from clone 
RP5-1 174N9 on chromosome 
lp34.1-35.3 
solute carrier family 9 
sodium/hydrogen exchanger 
isoform 3 regulatory factor 1 
membrane-spanning 4-domains 
subfamily A member 2 Fc 
fragment of IgE high affinity I 
receptor for beta polypeptide 
membrane-spanning 4-domains 
subfamily A member 2 Fc 
fragment of IgE high affinity I 
receptor for beta polypeptide 
centromere protein C 1 

tumor necrosis factor ligand 
superfamily member 4 tax- 
transcriptionally activated 
glycoprotein 1 34kD 
syntaxin binding protein 2 

hypothetical protein 



AB012668 

AI692348 
U00115 

U59912 
U59423 

U28042 

U68186 
X53586 
X95735 
W27419 



SLC9A3R1 AF015926 



MS4A2 



MS4A2 



CENPC1 
TNFSF4 



STXBP2 

DJ1042K10. 
2 



cysteine and glycine-rich protein 1 CSRP1 

Homo sapiens platelet/endothelial PECAM 
cell adhesion molecule- 1 
(PECAM- 1) gene, exon 16 and 
complete cds. 

solute carrier family 1 SLC1 A4 

glutamate/neutral amino acid 
transporter member 4 



M27394 



X07203 



M95724 
AL022310 



AB002559 
AL022238 

M33146 
L34657 



L14595 



PCT/US03/08486 

-7.6240 Below 

-7.6232 Below 

-7.6171 Below 

7.5991 Above 

7.5824 Above 

7.4276 Above 

-7.2991 Below 

7.2985 Above 

-7.2889 Below 

-7.2848 Below 

-7.2749 Below 

-7.2325 Below 

-7.2063 Below 



6.9679 Above 
6.8225 Above 



-6.6992 Below 

-6.6968 Below 

-6.6962 Below 

-6.6934 Below 



-6.6892 Below 
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Table 21. Genes Selected by T statistics for T-ALL 



17 
18 
19 
20 
21 
22 





A ffvnipffiv 

/Vlljlllvll 1A 

number 


Gene Name 


Gene 
Symbol 


Reference 
number 


T-stat 
value 


Above/ 
Below 
Mean 


1 


38242_at 


B cell linker protein 


SLP65 


AF0681&O 


lie Q^AO 


Below 


2 


38319_at 


CD3D antigen delta polypeptide 


CD3D 


AA919102 


27.6995 


Above 






TiT3 complex 










Q 
D 




fT)7Q/R antigen immimofflobului- 


CD79B 


M89957 


-23.7294 


Below 






associated beta 










A 

4 


IO 1 AH n + 

3o 14 /_at 


QTLTO /^Arnoin TVrrvt"fM"n 1 A Til in r* 3 n c 
Oli.^. (JOIIJd.111 piOLCUA 1 r\. JL^ tlliV^dll a 


SH2D1A 


AL023657 


22.4501 


Above 






disease lymphoproliferative 














syndrome 








DC1U W 


5 


38522_s_at 


CD22 antigen 


CD22 




6 


35350_at 


B cell RAG associated protein 


BRAG 


AB011170 


-19.1460 


Below 


1 


joz / / ai 


lxlliliall 11 Id llUJLCl.il ^-flv'l^'lXA ^v/i-' — ' 


CD3E 


M23323 


19.0859 


Above 






epsilon) gene, exon 9. 










Q 




nf^iiTwnpr^tinf*. ir 
lie lu yj \j c uuuv x 


NPY 


A11983U 


-18.8194 


Below 


9 


33705_at 


phosphodiesterase 4B cAMP- 


PDE4B 


L20971 


-18.6383 


Below 






specific dunce Drosophila 














homolog phosphodiesterase E4 






-18.5620 


Below 


10 


36S78_f_at 


major histocompatibility complex 


HLA-DQB1 


M60028 






class II DQ beta 1 










11 


36638_at 


connective tissue growth factor 


CTGF 


X78947 


-18.2772 


Below 



12 32794_g_at T cell receptor beta locus TRB 

13 32174_at solute carrier family 9 SLC9A3R1 

sodium/hydrogen exchanger 
isoform 3 regulatory factor 1 

14 1 6004 l_at protein tyrosine phosphatase non- PTPN 1 8 

receptor type 18 brain-derived 



X00437 
AP015926 



17.9081 
17.4427 



15 38521_at 

16 38018_g_at 



36571_at 

1096_g_at 

39318_at 

41710_at 

599__at 

266 s at 



23 36502_at 

24 39114_at 

25 37539_at 

26 40775_at 

27 34033 s at 



CD22 antigen CD22 

CD79A antigen immunoglobulin- CD79A 
associated alpha 

topoisomerase DNA II beta 180kD TOP2B 

CD19 antigen CD19 

T-cell leukemia/lymphoma 1A TCL1A 

hypothetical protein LOC54 1 03 

H2.0 Drosophila like homeo box 1 HLX1 

CD24 antigen small cell lung CD24 
carcinoma cluster 4 antigen 

PFTAIRE protein kinase 1 PFTK1 

decidual protein induced by DEPP 
progesterone 

RalGDS-like gene KIAA0959 KIAA0959 
protein 

integral membrane protein 2 A ITM2A 

leukocyte immunoglobulin-like LILRA2 
receptor subfamily A with TM 
domain member 2 

-73- 



Above 
Above 



X7956S 


-17.3412 


Below 


X59350 


-17.0388 


Below 


U05259 


-16.7948 


Below 


X68060 


-16.7508 


Below 


M28170 


-16.4583 


Below 


X82240 


-16.2017 


Below 


AL079277 


-15.9099 


Below 


M60721 


-15.5425 


Below 


L33930 


-15.0123 


Below 


AB020641 


-14.9972 


Below 


AB022718 


-14.9886 


Below 


AB023176 


-14.6872 


Below 


AL021786 


14.5666 


Above 


AF025531 


-14.3809 


Below 



BNSDOCID: <WO_ 



_03083140A2_L> 



WO 03/083140 
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28 2031_s_at 

29 3805 l_at 

30 35794_at 

31 41156__g_at 

32 32979_at 

33 32562_at 

34 36536_at 

35 36108_at 

36 41734_at 

37 41153_f_at 

3S 37710_at 

39 39893_at 

40 37908 at 



cyclin-dependent kinase inhibitor CDKN 1 A U03 1 06 
1A P 21 Cipl 

mal T-cell differentiation protein MAL X76220 
KIAA0942 protein KIAA0942 AB023 159 

catenin cadlierin-associated CTNNA1 U03 1 00 

protein alpha 1 1021cD 

GRB2-associated binding protein GAB 1 U43885 
1 

endoglin Osler-Rendu-Weber ENG X720 1 2 

syndrome 1 

schwannomin interacting protein 1 SCHIP-1 AF070614 
maj or histocompatibility complex HLA-DQB 1 M 1 6276 
class II DQ beta 1 

KIAA0870 protein KIAA0870 AB020677 

Homo sapiens alphaE-catenin CTNNA1 AF102803 
(CTNNA1) gene, exon 18 and 
complete cds. 

MADS box transcription enhancer MEF2C L0S895 
factor 2 polypeptide C myocyte 
enhancer factor 2C 

guanine nucleotide binding protein GNG7 ABO 1 04 1 4 

.G protein gamma 7 

guanine nucleotide binding protein GNG 11 U3 1 3 84 
11 



-14.1071 Below 

14.0743 Above 

-13.9659 Below 

-13.8135 Below 

-13.5842 Below 

-13.4209 Below 

-13.4172 Below 

-13.3518 Below 

-13.2672 Below 

-12.7927 Below 

-12.7716 Below 

-12.7696 Below 

-12.7353 Below 



Affymetrix 
number 



Table 22. Genes Selected by T statistics for TEL-AML1 

Gene Reference 
Symbol number 



Gene Name 



T-stat Above/ 
value Below 
Mean 



1 38578 at 



38203 at 



3 36524_at 

4 37780_at 

5 35614__at 

6 160029_at 

7 1980_s_at 

8 1488_at 

9 34194_at 

10 37908_at 

11 40272 at 



ARHGEF4 AB029035 



tumor necrosis factor receptor TNFRSF7 M63928 
superfamily member 7 

potassium intermediate/small KCNN1 U69883 
conductance calcium-activated 
channel subfamily N member 1 

Rho guanine nucleotide exchange 
factor GEF 4 

piccolo presynaptic cytomatrix PCLO ABO 11131 

protein 

transcription factor-like 5 basic TCFL5 AB012124 
helix-loop-helix 

protein kinase C beta 1 PRKCB 1 X07 109 

non-metastatic cells 2 protein NME2 X58965 
NM23B expressed in 

protein tyrosine phosphatase PTPRK L77886 

receptor type K 

Homo sapiens cDNA FLJ2 1 697 AL0493 1 3 

guanine nucleotide binding protein GNG1 1 U3 1 3 84 
11 

collapsin response mediator CRMP 1 D7 8012 

-74- 



15.2209 Above 



15.0804 Above 



14.9774 Above 

14.1405 Above 

12.9369 Above 

12.5429 Above 

-12.5035 Below 

12.3871 Above 

12.1089 Above 

11.4322 Above 

11.0625 Above 



WO 03/083140 



PCT7US03/08486 



protein 1 



12 


41097_at 


telomeric repeat binding factor 2 


TERF2 


AF002999 


11.0133 


Above 




^fiQO at 


Hnmn campnc mT?lNlA fT'i^JA 

JLAVJIUU oaJpiwJLlO llJXvlNxA. ^JL/INJTL 




AL080190 


10.8763 


Above 
















14 


32730_at 


Homo sapiens mRNA for 




AL080059 


10.7439 


Above 






KIAA1750 










15 


1325__at 


MAD mothers against 


MADH1 


U59423 


10.5332 


Above 






decapentaplegic Drosophila 














homolog 1 










16 


41819_at 


FYN-binding protein FYB- 


FYB 


U93049 


10.3692 


Above 






120/130 










17 


1299_at 


telomeric repeat binding factor 2 


TERF2 


X93512 


10.2921 


Above 


IS 


35665_at 


phosphoinositide-3 -kinase class 3 


PIK3C3 


Z46973 


10.0568 


Above 


1 0 


jOJj / dl 


iviio-bpcciiic guanine nucieouue 








A VirkT/f 5 
/\UUVC 






exchange factor pi 14 


GEF 








20 


37280 at 


A/T AT") mnthprc acrni'nct 




U59912 


9.S662 


Above 






decapentaplegic Drosophila 














homolog 1 










21 


1936_s_at 


proto-oncogene c-myc, alt. 




HG3523- 


-9.6621 


Below 






transcript 3, ORF 114 




HT4899 






22 


1077_at 


recombination activating gene 1 


RAG1 


M29474 


9.4563 


Above 


23 


38763_at 


Human (clone D21-1) L-iditol-2 




L29254 


-9.2719 


Below 






dehydrogenase gene, exon 9 and 














complete cds. 










24 


41295 _at 


GTT1 protein 


GTT1 


AL041780 


-9.1813 


Below 


25 


36008_at 


protein tyrosine phosphatase type 


PTP4A3 


AF041434 


9.1682 


Above 






TT T A 1 O 

IVA member 3 










26 


3S570_at 


major histocompatibility complex 


HLA-DOB 


X03066 


9.0394 


Above 






class II DO beta 










27 


32163_f_at 


EST 




AA2 16639 


9.0392 


Above 


28 


40570_at 


forkhead box OlA 


FOXOIA 


AF032885 


8.9931 


Above 






rhabdomyosarcoma 










29 


32724_at 


phytanoyl-CoA hydroxylase 


PHYH 


AF023462 


8.9571 


Above 






Refsum disease 










30 


932_i_at 


zinc finger protein 9 1 HPF7 


ZNF91 


LI 1672 


8.8075 


Above 






HTF10 










31 


37343_at 


inositol 1 4 5 -triphosphate receptor 


ITPR3 


U01062 


8.7321 


Above 






type 3 










32 


33447_at 


myosin light polypeptide 


MLCB 


X54304 


-8.6848 


Below 






regulatory non-sarcomeric 20kD 










33 


35362_at 


myosin X 


MYO10 


AB018342 


8.6700 


Above 


34 


38906„at 


spectrin alpha erythrocytic 1 


SPTA1 


M61877 


8.5010 


Above 






elliptocytosis 2 










35 


324_f_at 


basic transcription factor 3 


BTF3 


HG1515- 


-8.4705 


Below 










HT1515 






36 


39329_at 


actinin alpha 1 


ACTN1 


X15804 


-8.3219 


Below 


37 


577_at 


midkine neurite growth-promoting MDK 


M94250 


8.2693 


Above 






factor 2 










38 


40729_s_at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


S.2000 


Above 



polypeptide gene enhancer in B 
cells inhibitor-like 1 



-75- 



WO 03/083140 
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39 41442_at core-binding factor runt domain CBFA2T3 AB010419 8.0604 Above 

alpha subvmit 2 translocated to 3 

40 36275_at Homo sapiens mRNA from AB002438 7.8550 Above 

chromosome 5q21-22 clone 
FBR89 

4. Wilkins' 

This method of selecting genes uses the weighted sum of three components to 
5 estimate the discriminative value of each gene. The higher the score, the better the 
gene is at discriminating between the two classes. The input to the scoring method is 
preprocessed and normalized data. The idea of the metric is that a gene is a good 
discriminator if: (1) it is expressed in one class and not in the other, or if the gene is 
expressed in both classes, but significantly more so in one than the other, or (2) the 

10 gene is present in most samples, and the data are pure, in the sense that there is a 
threshold expression value for the gene where the gene generally has expression 
levels larger than the threshold in one class, and smaller than the threshold in the other 
class. The components of the metric were quantified as follows. For a gene, assume 
PRi is the ratio of "present" samples to all samples in class 1, where present means 

15 that the gene's expression value was not preprocessed to a constant (1). Assume PR 2 
is defined similarly for class 2. The first component of the metric, Mi, is estimated as 
the absolute difference between PRi and PR 2 . This value is between 0 (when the gene 
is equally present in both classes) and 1 (when the gene is expressed in one class and 
not in the other). The second component of the metric, M 2 , measures the extent to 

20 which the gene is present overall, and is defined as the average of PRi and PR 2 . The 
final component, M 3 , estimates the "purity", or existence of a threshold value. The 
gene expression values for the present samples are sorted into ascending order and a 
vector of their class labels is built, for example {+, +, +, -, -, -, +, +, -}. The next 
step is to find the best place to partition the samples so that the expression values for 

25 one class (maybe +) are less than the partition point, and the values from the other 

class are larger. Let L C i and L C2 be the number of class 1 and class 2 samples on the 
left side of the partition, respectively. Assume Rci and R C2 are defined similarly for 
the right side of the partition. Then the purity is estimated as: max {L C i - L C2 + Rc 2 - 
Rci, L C2 - Lei + Rci - Rc2> / number of total present samples. Each possible partition 

30 is checked. In the example above, the partition {+, +, +, || -, -, -, +, -, +, -} is the best 

-76- 
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partition, with a purity value of M 3 = 7/11- 0.64. The score for the gene is the 
weighted sum of 0.5*Mi + 0.25*M 2 + 0.25*M 3 . The top 50 genes for each subgroup 
selected by this metric are listed in Tables 23-29. For class prediction all 50 genes 
were used, unless otherwise stated. 



Table 23. Genes Selected by Wilkins' for BCR-ABL 















Above/ 




Affymetrix 




Gene 


Reference 


Train set 


Below 




nuniDer 


VJCllC JL^I tllllKs 


Symbol 


number 


score 


Mean 


1 


32319 at 


tumor necrosis factor ligand 
superfaniily member 4 tax- 
transcriptionally activated 


TNFSF4 


AL022310 


0.6354 


Above 






glycoprotein 1 34kD 






0.6352 


Below 


2 


37479_at 


CD72 antigen 


CD72 


M54992 


3 


1211_s_at 


CASP2 and RIPK1 domain 
containing adaptor with death 


CRADD 


U84388 


0.6265 


Above 






domain 






0.6161 


Above 


4 


37397_at 


platelet/endothelial cell adhesion 


PECAM 


L34657 






molecule-1 (PECAM-1) gene 






0.6118 


Below 


5 


33162_at 


insulin receptor 


INSR 


X02160 


6 


39691_at 


SH3-containing protein SH3GLB1 


SH3GLB1 


AB007960 


0.6089 


Above 


7 


1558__g_at 


p21/Cdc42/Racl-activated kinase 1 
yeast Ste20-related 


PAK1 


U24152 


0.6087 


Above 


8 


34759_at 


Human hbc647 mRNA sequence 




U68494 


0.6061 


Above 


9 


33774_at 


caspase 8 apoptosis-related cysteine CASP8 


X98172 


0.6040 


Above 






protease 






0.6021 


Above 


10 


1326_at 


caspase 10 apoptosis-related 
cysteine protease 


C ASP 10 


U60519 


11 


38312_at 


DKFZp5640222 from clone 




AL050002 


0.6010 


Above 






DKFZp5640222 






0.5989 


Above 


12 35970_g_at 


M-phase phosphoprotein 9 


MPHOSPH9 


N23137 


13 


41273_at 


FK506 binding protein 12- 
rapamycin associated protein 1 


FRAP1 


AL046940 


0.5989 


Above 


14 40798 s at 


a disintegrin and metalloproteinase 


ADAM 10 


Z48579 


0.5980 


Above 






domain 10 








Above 


15 40953_at 


calponin 3 acidic 


CNN3 


S80562 


0.5972 


16 


1434_at 


phosphatase and tensin homolog 
mutated in multiple advanced 


PTEN 


U92436 


0.5963 


Below 






cancers 1 








Above 


17 


38966_at 


glycoprotein synaptic 2 


GPSN2 


AF038958 


0.5953 


18 


35991__at 


Sm protein F 


LSM6 


AA917945 


0.5938 


Above 


19 


330_s_at 


tubulin, alpha 1, isoform 44 


TUBA1 


HG2259- 


0.5938 


Above 








HT2348 






20 


3S032_at 


KIAA0736 gene product 


KIAA0736 


ABO 18279 


0.5934 


Above 


21 


1983_at 


cyclin D2 


CCND2 


X68452 


0.5927 


Above 


22 


36194_at 


low density lipoprotein-related 


LRPAP1 


M63959 


0.5914 


Below 



2-macroglobulin receptor- 
associated protein 1 
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23 


34460_at 


peripheral benzodiazepine receptor- 


PRAX-1 


AB014512 


0.5911 


Above 






associated protein 1 










24 


onni ci at 


at^vi?i tf*1anp'ienta < ?ia imitated 


ATM 


U26455 


0.5910 


Above 






includes complementation groups A 














C and D 










25 


31443_at 


AML1 


AML1 


S76346 


0.5896 


Above 


26 


33410_at 


integrin alpha 6 


ITGA6 


S66213 


0.5896 


Above 


z / 


^7479 at 

J> /H / w 41 


TYio-riTir^cirlcicf 1 r^<*f"5i A l'\7Cf^cr\'mn 1 
iiiaLLU.UoJ.UaoC ucid r\. lyoUauiuai 


MANRA 


U60337 


0.5887 


Below 


2S 


36099_at 


splicing factor argirime/serine-rich 


SFRS1 


TV /TzT A A A A 

M 69 040 


0.5877 


Below 






1 splicing factor 2 alternate splicing 














factor 










29 


38636 at 


immiinop'lohiilin ^unerfamilv 

IlUllllUlVglul/Ullil kJ UUUi A. \A 1 LUX J 


ISLR 


AB003184 


0.5858 


Above 






containing leucine-rich repeat 










30 


34314_at 


ribonucleotide reductase Ml 


RRM1 


X59543 


0.5858 


Below 






polypeptide 










31 


36129_at 


KIAA0397 gene product 


KIAA0397 


AJB007857 


0.5858 


Above 


32 


40264 g at 


zinc finger protein- like 1 


ZFPL1 


AF001891 


0.585S 


Above 


33 


37399 at 


aldo-lceto reductase familv 1 


AXH1C3 


D 17793 


0.5852 


Above 






member C3 3-alpha hydroxysteroid 














dehydrogenase type II 










34 


38160_at 


lymphocyte antigen 75 


LY75 


AF011333 


0.5832 


Above 


35 


41649_at 


FOXJ2 forkhead factor 


LOC55810 


AF038177 


0.5832 


Above 


36 


36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.5832 


Above 


37 


40167_s_at 


CS box-containing WD protein 


LOC55884 


AF038187 


0.5832 


Above 


3S 


2064_g_at 


excision repair cross- 


ERCC5 


L20046 


0.5832 


Above 






complementing rodent repair 














deficiency complementation group 










39 


39729 at 


Human natural killer cell enhancing NKEFB 


L19185 


0.5829 


Below 






factor (NKEFB) rnRNA, complete 
cds. 














poly ADP-ribose glycohydrolase 


PARG 


J\J? \J\JJ\JHD 


UJOZO 


.Below 


41 


40613_at 


uncharacterized hypothalamus 


HT012 


AL031775 


0.5819 


Below 






protein HT012 










42 


39070_at 


singed Drosophila like sea urchin 


SNL 


U03057 


0.5813 


Above 






fascin homolog like 










43 


407S2_at 


short-chain 


SDR1 


AF061741 


0.5813 


Above 






dehydrogenase/reductase 1 










44 


34256_at 


sialyltransferase 9 CMP-NeuAc 


SIAT9 


AB018356 


0.5797 


Above 






lactosylceramide alpha-2 3- 














sialyltransferase GM3 synthase 










45 


41836 at 


protein with polyglutarnine repeat 


ERPROT213 


U94836 


0.5777 


Above 






calcium ca2 homeostasis 


-21 












endoplasmic reticulum protein 










46 


35681_r_at 


zinc finger homeobox IB 


ZFHX1B 


AB011141 


0.5759 


Below 


47 


37190_at 


WAS protein family member 1 


WASF1 


D87459 


0.5759 


Below 


48 


32788_at 


RAN binding protein 2 


RANBP2 


D42063 


0.5756 


Above 


49 


82S_at 


prostaglandin E receptor 2 subtype 


PTGER2 


U19487 


0.5740 


Above 






EP2 53kD 










50 


38220_at 


dihydropyrimidine dehydrogenase 


DPYD 


U20938 


0.5737 


Above 



-78- 



BNSDOCID: <WO 030831 40A2_I_> 



WO. 03/083 140 ... PCT/US03/0H486 

Table 24: Genes Selected by Wilkins* for E2A-PBX1 
Affymetrix Gene Name Gene Reference Train set Above/ 







Symbol 


nunioer 


score 


Below 












Mean 


1 3^063 at 

X J-UUJ Cll 


pre-B-cell leukemia transcription 


PBX1 


IVloOD40 


A 07CA 


Above 




factor 1 










2 38994_at 


STAT induced STAT irihibitor-2 


STATI2 


AF037989 


0.8252 


Below 


3 33355_at 


Homo sapiens cDNA FLJ 12900 fis 


PBX1 


AL049381 


0 8040 


AUUVC 




clone NT2RP2004321 (by 












CELERA serach of target sequence 










= PBX1) 










4 40454__at 


FAT tumor suppressor Drosophila 


FAT 


X87241 


0.7899 


Above 




homolog 










5 753_at 


nidogen 2 


NID2 


D86425 


0.7368 


.Above 


6 717_at 


GS3955 protein 


GS3955 


D87119 


0.7306 


Above 


7 1786 at 


c-mer proto-oncogene tyrosine 


MERTK 




A TJAA 


Above 




kinase 










8 39070 at 


singed Drosophila like sea urchin 


SNL 




a Tin i 
V./Z/l 


Below 




fascin homolog like 










9 1065_at 


fms -related tvrn<5inp lcinaQf* ^ 


FLT3 


U02687 


0.7160 


Below 


10 36650__at 


cvclin 


CCND2 


D13639 


0.7151 


Below 


11 33513_at 




SLAM 


U33017 


ft 7ftQ£ 


Above 




molecule 










12 33748_at 


minor histocompatibility antigen 


KIAA0223 


DS6976 


0.7084 


Below 




HA-1 










13 37225_at 


KIAA0172 protein 


KIAA0172 


D79994 


0.7033 


Above 


14 38717_at 


DKFZP586A0522 protein 


DKFZP586A AL050159 


0.7003 


Below 






0522 








jo oj4_at 


x5 lympnoia tyrosine Kinase 


BLK 


S76617 


0.69S2 


Above 


lo J3t)4i_g at 


nuciear iactor 01 Kappa nglit 


NFKBIL1 


Y14768 


0.6975 


Below 




nolvDeotide pene enhancer in TK- 












cells inhibitor-like 1 










17 4046S_at 


KIAA0554 protein 


KIAA0554 


AB011126 


ft 607 1 


Below 


18 41266_at 


integrin alpha 6 


ITGA6 


X53586 




Below^ 


19 36536 at 


schwaimomin interacting protein 1 


SCHIP-1 


AF070614 


0.693S 


Below 


20 362 at 




PRKCZ 


Z15108 


o.oyU4 


Above 


21 755 at 


inositol 1 4 5-triphosphate receptor 


ITPR1 


D26070 


U.OO / / 


Below 




type 1 










22 307_at 


arachidonate 5 -lipoxygenase 


ALOX5 


J03600 


0.6875 


Below 


23 39614_at 


KIAA0802 protein 


KIAA0802 


AB018345 


0.6863 


Above 


24 1563_s_at 


tumor necrosis factor receptor 


TNFRSF1A 


M58286 


0.6837 


Below 




superfamily member 1 A 










25 38748_at 


adenosine dearninase RNA-specific 


AD ARB 1 


U76421 


0.6763 


Above 




Bl homolog of rat RED1 










26 41409_at 


basement membrane-induced gene 


ICB-1 


AF044896 


0.6757 


Below 


27 34892_at 


tumor necrosis factor receptor 


TNFRSF10B 


AF0 16266 


0.6726 


Below 




superfamily member 10b 










28 40648_at 


c-mer proto-oncogene tyrosine 


MERTK 


U08023 


0.6710 


Above 




kinase 










29 38408_at 


transmembrane 4 superfamily 


TM4SF2 


L10373 


0.6667 


Below 




member 2 
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30 


34583_ 


at 


fins-related tyrosine kinase 3 


FLT3 


U02687 


0.6665 


Below 


31 


36900^ 


_at 


stromal interaction molecule 1 


STIM1 


U52426 


0.6650 


Below 


32 


37625_ 


at 


interferon regulatory factor 4 


IRF4 


U52682 


0.6636 


Above 


33 


3S340_ 


.at 


huntingtin interacting protein- 1- 
related 


KIAA0655 


AB014555 


0.6609 


Above 


34 


1830_s_at 


transforming growth factor beta 1 


TV 1 "CD 1 




VJ.OOUo 


Below 


35 


37099_ 


_at 


arachidonate 5-lipoxygenase- 
activating protein 


ALOX5AP 


AI806222 


0.6605 


Below 


36 


3S254_ 


.at 


KIAA0882 protein 


KIAA0882 


AB020689 


0.6539 


Below 


37 


37641_ 


.at 


Human gene for hepatitis C- 
associated microtubular aggregate 




D28915 


U.O>>31 


Below 








protein p44, exon 9 and complete 
cds. 

adenovirus 5 El A binding protein 










38 


33865_ 


at 


BS69 


AA127624 


0.6515 


Below 


39 


40729_ 


_s_at 


nuclear factor of kappa light 
polypeptide gene enhancer in B- 
cells inhibitor-like 1 


NFKBIL1 


Y14768 


0.6502 


Below 


40 401 13_ 


_at 


GS3955 protein 


GS3955 


D87119 


0.6476 


Above 


41 


32979_ 


. at 


GRB2-associated binding protein 1 


GAB1 


U43885 


0.6457 


Below 


42 


36591_ 


.at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.6427 


Below 


43 


3S739_ 


at 


v-ets avian erythroblastosis virus 
tLZo oncogene nomoiog z 


ETS2 


AF0 17257 


0.6424 


Below 


44 


37485, 


_at 


fatty-acid-Coenzyme A ligase very 
long-chain 1 


FACVL1 


D88308 


0.6363 


Above 


45 


538_at 


CD34 antigen 


CD34 


S53911 


0.6326 


Below 


46 


37893_ 


.at 


protein tyrosine phosphatase non- 
receptor type 2 


PTPN2 


AIS28880 


0.6318 


Above 


47 


41017_ 


at 


myosin-binding protein H 


MYBPH 


T inn £. £. 

U272oo 


n /com 

u.ozy / 


Above 


48 


37967_ 


_ at 


lymphocyte antigen 117 


LY117 


AF000424 


0.6260 


Below 


49 


3728 1_ 


at 


KIAA0233 gene product 


KIAA0233 


D87071 


0.6250 


Below 


50 


35675, 


at 


vinexin beta SH3-containing 


SCAM-1 


AF037261 


0.6229 


Below 



adaptor molecule- 1 



Table 25. Genes selected for Wilkins for Hyperdiploid > 50 





Affymetrix 


Gene Name 


Gene 


Reference 


Train set 


Above/ 




number 




Symbol 


number 


score 


Below 
Mean 


1 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


0.5838 


Below 


2 


41470_at 


Prominin mouse like 1 


PROML1 


AF027208 


0.5616 


Above 


3 


39069_at 


AE-binding protein 1 


AEBP1 


AF053944 


0.5423 


Below 


4 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


0.5399 


Above 


5 


578_at 


Human recombination acitivating 
protein (RAG2) gene, last exon 


RAG2 


M94633 


0.5208 


Below 


6 


3225 l_at 


hypothetical protein FLJ21 174 


FLJ21174 


AA149307 


0.5164 


Above 


7 


40480__s_at 


FYN oncogene related to SRC FGR FYN 


M14333 


0.5090 


Above 






YES 










8 


38604_at 


neuropeptide Y 


NPY 


AI198311 


0.5083 


Above 
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9 40903_at 


ATPase H transporting lysosomal 


APT6M8-9 


AL049929 


0.5080 


Above 




vacuolar proton punip membrane 
sector associated protein M8-9 










10 38968_at 


SH3-domain binding protein 5 


SH3BP5 


AB005047 


0.5057 


Above 




BTK-associated 










11 37272 at 


inositol 1 4 5-trisphosphate 3- 


ITPKB 


X57206 


0.5025 


Below 




kinase B 








Above 


12 35688_g_at 


mature T-cell proliferation 1 


MTCP1 


Z24459 


0.5018 


1 ^ 1AQQ at 


r»rntein tvrn^ine TYhosnhatase 


PTPRK 


L77886 


0.4977 


Below 




receptor type K 






0.4964 


Below 


14 at 


spleen tyrosine kinase 


SYK 


L2S824 


Id lo3U_S_at 


■4-1 rr/"\ o -l r\ \— 1 n £J O t> C \ / 1^" 

ryrobinc Kixictbc oy iv 


syk 


HG3730- 


0.4913 


Below 






HT4000 






16 38317_at 


transcription elongation factor A 


TCEAL1 


M99701 


0.4901 


Above 




SII like 1 








Below 


17 38649_at 


KIAA0970 protein 


KIAA0970 


AB023187 


0.4898 


18 39721__at 


ephrin-Bl 


EFNB1 


U09303 


0.4895 


Above 


19 33307_at 


kraken-like 


BK126B4.1 


AL022316 


0.4880 


Below 


''O 38518 at 


sex comb on midleg Drosopliila like SCML2 


Y 18004 


0.4879 


Above 


91 1040? at 

— 1 J7tvZ. til 


z 

interleukin 1 beta 


IL1B 


M15330 


0.4750 


Above 


zz 3o4oy_ai 


pnospnorioosyi pyropnospruue 


PRPS1 


D00S60 


0.4718 


Above 




svnthetase 1 






0.4717 


Above 


23 37747 at 


Uiimnn o nnpYin V ( A7vTx~5^ {?ene 

XXUIIlaii aiJJ.iCA.Lii v j. n ^v-'y ^v^Ai^ 5 


(ANX5 


U05770 




exon 13. 








Below 


24 40200_at 


heat shock transcription factor 1 


HSF1 


M64673 


0.4689 


25 35940_at 


POU domain class 4 transcription 


POU4F1 


X64624 


0.4685 


Above 




factor 1 








Below 


26 35727_at 


hypothetical protein FLJ20517 


FLJ20517 


AI249721 


0.4675 


27 1357_at 


ubiquitin specific protease 4 proto- 


USP4 


U20657 


0.4670 


Below 


28 36592_at 


oncogene 
prombitin 


PHB 


S85655 


0.4668 


Above 


29 37014 at 


myxovirus influenza resistance 1 


iVLXl 


M33882 


0.4635 


Above 




homolog of murine interferon- 












inducible protein p78 






0.4608 


Above 


30 40891_f_at 


DNA segment on chromosome X 


DXS9879E 


X92896 




unique 9879 expressed sequence 










31 40846_g_at 


interleukin enhancer binding factor 


ILF3 


U10324 


0.4605 


Below 


3 90Kd 








Ahnvp 


32 41132_rat 


heterogeneous nuclear 


HNRPH2 


TTA1 Q01 


U.HOUJ 




ribonucleoprotein H2 H 










33 37280_at 


MAD mothers against 


MADH1 




0.4595 


Below 




decapentaplegic Drosophila 












homolog 1 






0.4594 


Above 




POU domain class 4 transcription 


POU4F1 


L20433 




factor 1 








Above 


35 890_at 


ubiquitin-conjugating enzyme E2A 


UBE2A 


M74524 


0.4570 




RAD6 homolog 








Above 


36 38738_at 


SMT3 suppressor of mif two 3 


SMT3H1 




0.4568 




yeast homolog 1 










37 38458_at 


Human cytochrome b5 (CYB5) 


CYB5 


L39945 


0.4552 


Above 




gene, exon 6 and complete cds. 
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38 38869 


at 


KIAA1069 protein 


KIAA1069 


AB028992 


0.4549 


Above 


39 915 at 




interferon-induced protein with 


IFIT1 


M24594 


0.4544 


Above 






tetratricopeptide repeats 1 










40 3840S_ 


at 


transmembrane 4 superfamily 


TM4SF2 


L10373 


0.4535 


Above 






member 2 








rseiow 


41 39301_ 


at 


calpain 3 p94 


CAPN3 


XS5030 




42 41425_ 


_ at 


Friend leukemia virus integration 1 


TJT T1 




0 4S1 9 


Below 


43 2094_s 


i_at 


v-fos FBJ murine osteosarcoma 


FOS 


K0065U 


C\ A ^ 1 A 
U.4D 14 


Above 






viral oncogene homolog 










44 36605, 


_ at 


transcription factor 4 


TCF4 


M74719 


0.4497 


Above 


45 37709_ 




DNA segment numerous copies 


DXF68S1E 


M86934 


0.4493 


Above 






expressed probes GS1 gene 








Above 




_ai 


uansinernpi cine udiiiL'jYijjg, piuicm 


TMP21 


L40397 


0.4488 


47 171_at 




von Hippel-Lindau binding protein 
i 


Vr>r 1 


UjOojj 


U.^+H / D 


AUUVC 


48 41490 


at 


i 

phosphoribosyl pyrophosphate 


PRPS2 


Y00971 


0.4466 


Above 






synthetase 2 






0.4448 


Above 


49 36536. 


_ at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


50 35843_ 


_at 


Homo sapiens mRNA cDNA 




L40402 


0.4443 


Above 






DKFZp434D0935 















Table 26. Genes Selected by Wilkins' 


for MLL 








Affymetrix 


Gene Name 


Gene 


Reference 


Train set 


Above/ 




number 




Symbol 


number 


score 


Below 
Mean 


1 


39402_at 


interleukin 1 beta 


IL1B 


M15330 


0.7355 


Below 


2 


307_at 


arachidonate 5 -lipoxygenase 


ALOX5 


J03600 


0.7221 


Below 


3 


1389_at 


membrane metallo-endopeptidase 
neutral endopeptidase 


MME 


J03779 


0.7178 


Below 






enkephalinase CALLA CD 10 








Below 


4 


37280_at 


MAD mothers against 
decapentaplegic Drosophila 


MADH1 


U59912 


0.7021 






homolog 1 








Below 


5 


36650_at 


cyclin D2 


CCND2 


D13639 


0.6759 


6 


37043_at 


inhibitor of DNA binding 3 
dominant negative helix-loop-helix 


ID3 


AL021154 


0.6743 


Below 






protein 








Below 


7 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


0.6689 


8 


40913_at 


ATPase Ca transporting plasma 


ATP2B4 


W28589 


0.6684 


Below 






membrane 4 








Below 


9 


36536_at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


0.6554 


10 37398_at 


platelet/endothelial cell adhesion 


PECAM1 


AA100961 


0.6548 


Below 






molecule CD31 antigen 








Below 


11 


39114_at 


decidual protein induced by 


DEPP 


AB022718 


0.6478 






progesterone 






0.6432 


Below 


12 37967_at 


lymphocyte antigen 117 


LY117 


AF000424 


13 


1325_at 


MAD mothers against 
decapentaplegic Drosophila 


MADH1 


U59423 


0.6421 


Below 






homolog 1 






0.6395 


Below 


14 


38336_at 


K1AA1013 protein 


KIAA1013 


AB023230 


15 


577_at 


midkine neurite growth-promoting 


MDK 


M94250 


0.6363 


Below 



factor 2 
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16 


38671_at 


KIAA0620 protein 


KIAA0620 


AB014520 


a 




17 


33412_at 


LGALS1 Lectin, galactoside- 


LGALS1 


A1535940 


A AIKI 
[J.ODJ l 


Above 






binding, soluble, 1 










18 


4045 l_at 


hypothetical protein FLJ21434 


"CT TO 1 A"XA 


AT OS 090^ 


0 63 SO 


Below 


19 


36908_at 


Human macrophage mannose 


1V11VV-' 1 


M93221 


0.6290 


Below 






receptor (3VIRC1) gene, exon 30. 










20 


963_at 


ligase IV DNA ATP -dependent 


LIG4 


X83441 


0.6282 


Below 


21 


41346_at 


like-glycosyltransferase 


LARGE 


AJ007583 


a <o 1 A 


iseiow 


99 




mpmhranp •nrntftiTi "nalmitovlated 1 


MPP1 


M64925 


0.6155 


Below 






55kD 






0.6145 


Above 


Z3 


on^o ot 

ZUOZ at 


inculin crmwtVi fa pfAr bindinff 

lllo LLULL1 11 JvO til U will ldoiui l/ixiuj-u^ 


IGFBP7 


L19182 






protein 7 






0.6137 


Below 


24 


38408_at 


transmembrane 4 superfamily 


TM4SF2 


L10373 






member 2 










25 


854_at 


B lymphoid tyrosine kinase 




o /0O1 / 


0 6075 


Above 


26 


32193_at 


plexin CI 


PLXNC1 


AF030339 


0.6065 


Above 


27 


35939 s at 


POU domain class 4 transcription 


POU4F1 


L20433 


0.6046 


Below 






i.aUlUl 1 








Below 




33705 at 


phosphodiesterase 4B cAMP- 


PDE4B 


L20971 


0.5991 




















phosphodiesterase E4 






0.5979 


Below 


29 


34168_at 


deoxynucleotidyltransferase 


DNTT 


Ml 1722 






terminal 








Below 


30 36383 at 


v-ets avian erythroblastosis virus 




A/T179^A 
IV1 1 / jLJH 


0 5976 






T?9^* nnrnopnp rplatprl 










31 


3896S_at 


SH3 -domain binding protein 5 


oilJDrJ 


AB005047 


0.5976 


Below 






BTK-associated 










32 39263_at 


2 5 oligoadenylate syntlietase 2 


UAoZ 


IVlO / t tj*T 


0.5967 


Below 


33 


39329_at 


actinin alpha 1 


AU 1 IN I 


S\. 1 JO V*t 


0.5953 


Below 


34 


34699_at 


CD2-associated protein 




AT 050105 


0.5945 


Below 


35 


1267_at 


protein kinase C eta 








Below 


36 35172_at 


tyrosylprotein sulfotransferase 2 


TPST2 


AF049891 


0.5937 


Below 


37 


38124_at 


midkine neurite growtli-promoting 


MDK 


X55110 


0.5936 


Below 






factor 2 










38 


33813_at 


tumor necrosis factor receptor 


TNFRSF1B 


AI813532 


0.5934 


Below 






superfamily member IB 










39 


34176_at 


hypothetical protein from clone 643 LOC57228 


AF091087 


0.5930 


Below 


40 


39424_at 


tumor necrosis factor receptor 


TNFRSF14 


U70321 


0.5930 


Below 






superfamily member 14 herpesvirus 












entry mediator 








Below 


41 


40729_s__at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


0.5905 






polypeptide gene enhancer in B- 














cells inhibitor-like 1 










42 


32607_at 


brain acid-soluble protein 1 


BASP1 


ArUiytOO 




Above 


43 


38342_at 


KIAA0239 protein 


KIAA0239 


D87076 




Below 


44 32533 s at 


vesicle-associated membrane 


VAMP5 


AF0548^5 


0.5880 


Below 






protein 5 myobrevin 










45 


39330_s_at 


actinin alpha 1 


ACTN1 


M95178 


0.5867 


Below 
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46 40519_at 


protein tyrosine phosphatase 


PTPRC 


Y00638 


0.5848 


Above 






receptor type C 






A CO A A 

0.5844 


Above 


47 39338 at 


SI 00 calcium-binding protein A 10 


S100A10 


AI201310 






annexin II hgand calpactm I light 














poiypepuue pi j 






0.5824 


Below 


48 


35940_at 


POTT Hnmain r1a<?<? 4 transcriDtion 


POU4F1 


X64624 






factor 1 








Below 


49 39712_at 


SI 00 calcium-binding protein A13 


S100A13 


AI541308 


0.5818 


50 39379 at 


Homo sapiens mRNA cDNA 




AL049397 


0.5811 


Above 






DKFZp586C1019 from clone 














DKFZp586C1019 














Table 27: Genes Selected by Wilkins' for Novel Risk Group 








Affymetrix 


Gene Name 


Gene 


Reference Train set 


Above/ 




number 




Symbol 


number 


score 


Below 










Mean 


1 


31892_at 


protein tyrosine phosphatase 


PTPRM 


X58288 


0.8668 


Above 






receptor type M 








Below 


2 


41734_at 


KIAA0870 protein 


KIAA0870 


AB020677 


0.8614 


3 


995_g_at 


protein tyrosine phosphatase 


PTPRM 


X5S288 


0.8505 


AUUVV 






receptor type M 






0.7694 


Above 


4 


994_at 


protein tyrosine phosphatase 


PTPRM 


X58288 






receptor type M 








Below 


5 


37967_at 


lymphocyte antigen 117 


LY117 


AF000424 


0.7399 


6 


34676_at 


KIAA1099 protein 


KIAA1099 


AB029022 


0.7298 


Above 


7 


41159_at 


Clathrin heavy polypeptide He 


CLTC 


D21260 


0.7283 


Above 


8 


39728_at 


interferon gamma-inducible protein 


IFI30 


J03909 


0.7138 


Below 


9 


37542_at 


30 

lipoma HMGIC fusion partner-like 


LHFPL2 




0 706Q 


Above 


10 


35350_at 


2 

B cell RAG associated protein 


BRAG 


AB011170 


0.7049 


Below 


11 


41438_at 


KIAA1451 protein 


KIAA1451 


AL049923 


0.6999 


Below 


12 


34370_at 


Archain 1 


ARCN1 


X81198 


0.6999 


Below 


13 


36029_at 


chromosome 11 open reading frame CHORF8 


U57911- 


0.6964 


Above 


14 37960 at 


o 

carbohydrate chondroitin 6/keratan 


CHST2 


AB014679 


0.6947 


Above 






sulfotransferase 2 








X5CJLUW 


15 


35869_at 


MD-1 RP105-associated 


MD-1 


AB020499 


0.6908 


16 


36601_at 


Vinculin 


VCL 


M3330S 


0.6908 


Below 


17 


40775_at 


Integral membrane protein 2A 


ITM2A 


AL021786 


0.6879 


Above 


18 


3728 l_at 


KIAA0233 gene product 


KIAA0233 


D87071 


0.6837 


Below 


19 


957__at 


Arrestin, beta 2 


ARRB2 


HG2059- 


0.6744 


Below 








HT2114 






20 


33284_at 


myeloperoxidase 


MPO 


M19507 


0.6712 


Below 


21 


40585_at 


adenylate cyclase 7 


ADCY7 


D25538 


0.6712 


Below 


22 


37908_at 


guanine nucleotide binding protein 


GNG11 


U31384 


0.6656 


Above 


23 


40167_s_at 


11 

CS box-containing WD protein 


LOC55884 


AF038187 


0.6581 


Below 


24 


38576_at 


H2B histone family member B 


H2BFB 


AJ223353 


0.6576 


Below 


25 


36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.6576 


Below 
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factor 2 polypeptide C myocyte 
enhancer factor 2C 

27 33924_at KIAA 1091 protein 

28 32724_at phytanoyl-CoA hydroxylase 

Refsum disease 

29 33358_at EST (retina) 

30 33740__at chromosome 1 open reading frame 

2 

31 36588_at KIAA08 10 protein 

32 3S802_at progesterone binding protein 

33 3S408_at transmembrane 4 superfamily 

member 2 

34 32227_at proteoglycan 1 secretory granule 

35 34840_at Homo sapiens cDNA FLJ22642 fis 

clone HSI06970 

36 113 l_at mitogen-activated protein kinase 

kinase 2 

37 33410_at integrin alpha 6 

38 38006_at CD4 8 antigen B-cell membrane 

protein 

39 33907_at eukaryotic translation initiation 

factor 4 gamma 3 

40 41273_at FK5 06 binding protein 12- 

rapamycin associated protein 1 

41 397Sl_at insulin-like growth factor-binding 

protein 4 

42 39893_at guanine nucleotide binding protein GNG7 

G protein gamma 7 



MEF2C 


S57212 


0.6576 


Below 


KIAA1091 


AB029014 


0.6484 


Below 


PHYH 


AF023462 


0.6466 


Above 




W29087 


0.6457 


Above 


C10RF2 


AF023268 


0.6441 


Below 


KIAA0810 


AB018353 


0.6441 


Below 


HPR6.6 


Y12711 


0.6441 


Below 


TM4SF2 


L10373 


0.6440 


Below 


PRG1 


X17042 


0.6409 


Below 




Ai/UUOo d 




Below 


MAP2K2 


LI 1285 


0.6409 


Below 


ITGA6 






XiUUVv 


CD48 


M37766 


0.6342 


Below 


EIF4G3 


AF012072 


0.6304 


Below 


FRAP1 


AL046940 


0.6304 


Below 


IGFBP4 


U20982 


0.6301 


Below 



AB010414 



0.6301 Below 



43 37326_at 


proteolipid protein 2 colonic 


PLP2 


U93305 


0.6267 


Below 




epithelium-enriched 










44 36687_at 


cytochrome c oxidase subunit Vllb 


COX7B 


N50520 


0.6266 


Below 


45 40423_at 


KIAA0903 protein 


KIAA0903 


AB020710 


0.6254 


Above 


46 32542_at 


four and a half LIM domains 1 


FHL1 


AF063002 


0.6236 


Below 


47 33232_at 


cysteine-rich protein 1 intestinal 


CRIP1 


AI017574 


0.6211 


Below 


48 37280_at 


MAD mothers against 


MADH1 


U59912 


0.6208 


Above 




decapentaplegic Drosophila 












homolog 1 






0.6208 


Above 


49 1325_at 


MAD mothers against 


MADH1 


U59423 




decapentaplegic Drosophila 










50 40729_s_at 


homolog 1 

nuclear factor of kappa light 


NFKBIL1 


Y14768 


0.6199 


Below 




polypeptide gene enhancer in B- 
cells inhibitor-like 1 
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Table 28. Genes selected by Wilkins' for T-ALL 



Affymetrix 
number 

1 38242_at 

2 37988_at 

3 1096_g_at 

4 3931S_at 

5 3801S_g_at 

6 36878_f_at 

7 38147_at 

8 35350_at 

9 3S051_at 

10 266_s_at 

11 38521_at 

12 37344_at 

13 34033_s_at 

14 36638_at 

15 38213_at 

16 41734_at 

17 37711_at 

18 36239_at 

19 38319_at 

20 38894_g_at 

21 33705_at 

22 38017_at 

23 41156_g_at 

24 38994_at 

25 37710_at 

26 41155 at 



Gene Name 



B cell linker protein 

CD79B antigen immunoglobulin- 
associated beta 

CD 19 antigen 

T-cell leukemia/lymphoma 1A 

CD79A antigen immunoglobulin- 
associated alpha 

major histocompatibility complex 
class II DQ beta 1 

SH2 domain protein 1 A Duncan s 
disease lymphoproliferative 
syndrome 

B cell RAG associated protein 

mal T-cell differentiation protein 

CD24 antigen small cell lung 
carcinoma cluster 4 antigen 

CD22 antigen 

major histocompatibility complex 
class II DM alpha 

leukocyte immunoglobulin-like 
receptor subfamily A with TM 
domain member 2 
connective tissue growth factor 

galactosidase alpha 

KIAA0870 protein 

MADS box transcription enhancer 
factor 2 polypeptide C myocyte 
enhancer factor 2C 
POU domain class 2 associating 
factor 1 

CD3D antigen delta polypeptide 
TiT3 complex 

neutrophil cytosolic factor 4 40kD 

phosphodiesterase 4B cAMP- 
specific dunce Drosophila homolog 
phosphodiesterase E4 
CD79A antigen immunoglobulin- 
associated alpha 

catenin cadherin-associated protein 
alpha 1 102kD 

STAT induced STAT inhibitor-2 

MADS box transcription enhancer 
factor 2 polypeptide C myocyte 
enhancer factor 2C 
catenin cadherin-associated protein 
alpha 1 102kD 
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Gene 
Symbol 

SLP65 
CD79B 

CD19 
TCL1A 
CD79A 



Reference 
number 

AF068180 
M89957 

M28170 
X82240 
U05259 



Train set 
score 



HLA-DQB1 M60028 
SH2D1A AL023657 



BRAG 
MAL 
CD24 

CD22 

HLA-DMA 
LILRA2 

CTGF 
GLA 

KIAA0870 
MEF2C 

POU2AF1 
CD3D 

NCF4 
PDE4B 

CD79A 

CTNNA1 

STATI2 
MEF2C 



AB011170 

X76220 

L33930 

X59350 
X62744 

AF025531 

X7S947 
U78027 
AB020677 
S57212 

Z49194 
AA919102 

AL008637 

L20971 

i 

U05259 

U03100 

AF037989 
L08895 



CTNNA1 U03100 



Above/ 
Below 
Mean 

0.86S3 Below 

0.8422 Below 



0.8 1S1 Below 
0.8128 Below 
0.8127 Below 



0.8053 Below 

0.8016 Above 

0.7914 Below 

0.7900 Above 

0.7867 Below 

0.7856 Below 

0.7835 Below 

0.7761 Below 

0.7755 Below 

0.7701 Below 

0.7693 Below 

0.7560 Below 

0.7440 Below 

0.7426 Above 

0.7422 Below 

0.7414 Below 

0.7360 Below 

0.7315 Below 

0.7292 Below 

0.7283 Below 

0.7278 Below 
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T 
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27 


40570_at 


forkhead box OlA 


FOX01A 


AF032885 


0.7258 


Below 






rhabdomyosarcoma 










28 


34224_at 


fatty acid desaturase 3 


FADS3 


AC004770 


0.7254 


Below 


29 


38604_at 


neuropeptide Y 


NPY 


AI198311 


0.7212 


Below 


30 


36773_f_at 


major histocompatibility complex 


HLA-DQBl 


M81141 


0.7197 


Below 






class II DQ beta 1 










31 


32562 at 


endofflin Osler-Rendu- Weber 


ENG 


X72012 


0.7180 


Below 






syndrome 1 










32 


36502_at 


PFTAIRE protein kinase 1 


PFTK1 


AB020641 


0.7179 


Below 


33 


37180_at 


phospholipase C gamma 2 


PLCG2 


X14034 


0.7114 


Below 






phosphatidylinositol-speciflc 










34 


38893_at 


neutrophil cytosolic factor 4 40kD 


NCF4 


AL008637 


0.7100 


Below 


35 


387_at 


cyclin-dependent kinase 9 CDC2- 


CDK9 


X80230 


0.7024 


Below 






related kinase 










36 


32035_at 


Human MHC class II HLA- 




Ml 6942 


0.6992 


Below 






DRw5 3 -associated glycoprotein 














beta- chain mRNA complete cds 










37 


41153__f_at 


Homo sapiens alphaE-catenin 


CI JNJNA1 


Ar lUzoUi 


U.oy /o 


Below 






(CTNNA1) gene 










38 40780_at 


C-terminal binding protein 2 


CTBP2 


AF016507 


0.6976 


Below 


39 


40775_at 


integral membrane protein 2A 


llJvlzA 




0.6952 


Above 


40 


39402_at 


interleukin 1 beta 


IL1B 


TV K 1 *"> C\ 

M15330 


0.6945 


Below 


41 


38522_s_at 


CD22 antigen 


CD22 


X527S5 


0.6945 


Below 


42 


41166_at 


immunoglobulin heavy constant mu 


IGHM 


X58529 


0.6941 


Below 


43 


36937_s_at 


PDZ and LIM domain 1 elfin 


PDLEvll 


U90878 


0.6937 


Below 


44 


38833_at 


Human nrRNA for SIR rlassTT 




X00457 


0 69^5 


-L> C1U W 






histocompatibility antigen alpha- 














chain 










45 


2047_s_at 


junction plakoglobin 


JUP 


TV If ^ 1 /) 1 A 

M23410 


0.6920 


Below 


46 


36277_at 


Human membran protein (CD3- 


CD3E 


M23323 


0.6899 


Above 






epsilon) gene, exon 9. 










47 


40688_at 


linker for activation of T cells 


LAT 


AJ223280 


0.6898 


Above 


48 


39389_at 


CD9 antigen p24 


CD9 


M38690 


0.6879 


Below 


49 


33162_at 


Insulin receptor 


INSR 


X02160 


0.6879 


Below 


50 


31891__at 


chitinase 3 -like 2 


CHI3L2 


U58515 


0.6872 


Above 






Table 29. Genes Selected by Wilkins 1 for 


TFT -AMI 1 








Affymetrix 


Gene Name 


Gene 


Reference 


Train set 


Above/ 




number 




Symbol 


number 


score 


Below 














Mean 


1 


37780_at 


Piccolo presynaptic cytomatrix 


PCLO 


AB011131 


0.7121 


Above 






protein 










2 


38203_at 


potassium intermediate/small 


KCNN1 


U69883 


0.7086 


Above 



conductance calcium-activated 



channel subfamily N member 1 
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3 36524_at 


Rho guanine nucleotide exchange 


AKHCjbr4 




0 (\19S> 

\J.\J l OA 


Above 




factor GEF 4 










4 38578_at 


tumor necrosis factor receptor 


TTvTDT) CEi""7 




0 671 8 

\J.\J 1 X o 


Above 




simerfamilv member 7 










5 32730_at 


Homo sapiens mRNA for RIAA1750 






A A/t 1 £ 
U.OOIO 


ADUVC 




protein partial cds 










6 34194_at 


xiomo sapiens cuin/\ rJLJzio^/ iis> 




AL0493H 


0.6518 


Above 




clone COL09740 










7 40272_at 


collapsin response mediator protein 1 


CRMP1 


D7S012 


0.6160 


Above 


8 41819_at 


FYN-binding protein FYB-120/130 


FYB 


U93049 


0.6058 


Above 


9 14S8_at 


protein tyrosine phosphatase receptor 


PTPRK 


L77886 


0.6056 


Above 




type K 






0.6022 


Above 


10 35665_at 


phosphoinositide-3 -kinase class 3 


PIK3C3 




11 35614_at 


transcription factor-like 5 basic helix- 


TCFL5 


AB012124 




Above 




loop-helix 










12 36008_at 


protein tyrosine phosphatase type IVA PTP4A3 


AF041434 


0.5976 


Above 




member 3 










13 35362_at 


Myosin X 


MYO10 


ADA1 Q1A1 




Above 


14 37908_at 


guanine nucleotide binding protein 1 1 


GNG11 


U31384 


0.5888 


Above 


15 39329_at 


Actinin alpha 1 


ACTN1 


XI joU4 




DclOW 


16 1936_s_at 


proto-oncogene c-myc, alt. transcript 






O ^761 
U.D /Ol 


. Below 




3, ORF114 




HT4899 






17 33690_at 


Homo sapiens mRNA cDNA 


DKFZp434 


AT 0801 Q0 


0.5725 


Above 




DKFZp434A202 


A202 








18 39389_at 


CD9 antigen p24 


CD9 


M38690 


0.5684 


Below 


19 37343_at 


inositol 1 4 5 -triphosphate receptor 


ITPR3 


U01062 


0.5642 


Above 




type 3 






0.5585 


Above 


20 1299_at 


telomeric repeat binding factor 2 


TERF2 


X93512 


21 38652_at 


hypothetical protein FLJ20154 


FLJ20154 


AF070644 


0.5563 


Above 


22 38763_at 


(clone D21-l)L-iditol-2 




L29254 


0.5535 


Below 




dehydrogenase gene 






o ^o^ 


15 CIO W 


23 37724_at 


v-myc avian myelocytomatosis viral 
oncogene homolog 


MYC 


VUUDOo 


24 36937_s_at 


PDZ and LIM domain 1 elfin 


PDLIM1 


U90878 


0.5506 


Below 


9^ 11?^ fit 


MAD mothers against 


MADH1 


U59423 


0.5482 


Above 




decapentaplegic Drosophila homolog 
1 

adaptor-related protein complex 1 
sigma 2 subunit 










26 41549_s_at 


AP1S2 


A I7AQ1 (Yin 

ArUl/ 1U / / 


A ^AHA 


DC1UW 


Z / 5\?oL /_at 


hypothetical protein 


FLJ20500 


AA522530 


0.5471 


Below 


28 32724_at 


phytanoyl-CoA hydroxylase Refsum 


PHYH 


AF023462 


0.5459 


Above 




disease 








Above 


29 31786_at 


Sam68-like phosphotyrosine protein 
T-STAR 


T-STAR 


AF051321 


0.5403 


30 38570_at 


major histocompatibility complex 
class II DO beta 


HLA-DOB 


X03066 


0.5384 


Above 


31 39330_s_at 


actinin alpha 1 


ACTN1 


M95178 


0.5375 


Below 
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32 36493_at lymphocyte-specific protein 1 

33 574_s__at caspase 1 apoptosis-related cysteine 

protease interleukin 1 beta convertase 

34 32224_at KIAA0769 gene product 

35 1077_at recombination activating gene 1 

36 37280_at MAD mothers against 

decapentaplegic Drosopliila homolog 
1 

37 41200_at CD36 antigen collagen type I receptor 

thrombospondin receptor like 1 

38 36009_at hypothetical protein 

39 36933_at N-myc downstream regulated 

40 1 126_s_at Human cell surface glycoprotein 

CD44 (CD44) gene, 3* end of long 
tailed isoform. 

41 39S24_at ESTs 

42 3807S_at filamin B beta actin-binding protein- 

278 

43 38127_at syndecan 1 

44 32941_at interferon consensus sequence 

binding protein 1 

45 37276_at IQ motif containing GTPase 

activating protein 2 

46 34768_at DKFZP564E 1962 protein 

47 39781_at insuhn-like growth factor-binding 

protein 4 

48 37918_at integrin beta 2 antigen CD 18 p95 

lymphocyte function-associated 
antigen 1 macrophage antigen 1 mac- 
1 beta subunit 

49 41490_at phosphoribosyl pyrophosphate 

synthetase 2 

50 41814_at fucosidase alpha-L- 1 tissue 



LSP1 
CASP1 

KIAA0769 

RAG1 

MADH1 

CD36L1 

CL683 

NDRG1 

CD44 



FLNB 

SDC1 
ICSBP1 



M33552 
MS7507 

AB018312 

M29474 

U59912 

Z22555 

AF091092 

D87953 

L05424 

AI391564 
AF0421 66 

Z48199 
M91196 



IQGAP2 U51903 



DKFZP564 

E1962 

IGFBP4 

ITGB2 



PRPS2 
FUCA1 



AL080080 

U20982 

M15395 

Y00971 
M29S77 



0.5356 Below 

0.5336 Below 

0.5326 Above 

0.5302 Above 

0.5283 Above 

0.5261 Above 

0.5259 Below 

0.5254 Below 

0.5232 Below 

0.5231 Above 

0.5208 Below 

0.5199 Above 

0.5195 Below 

0.5191 Below 

0.5184 Below 

0.5173 Below 

0.5162 Below 



0.5155 Below 
0.5101 Above 



5. SOM/DAV 

The 10,991 probe sets that passed the variation filter were used for subsequent 
selection of discriminating genes using the self-organizing map (SOM) and 
discriminant analysis with variance (DAV) programs in the GeneMaths software 
package (version 1.5, Applied Maths, Belgium). The subgroups for which genes were 
selected included T-lineage ALL, TEL-AMU, E2A-PBX1 , MLL rearrangement, BCR- 
ABL, hyperdiploid ALL (chromosomal number > 50) and the novel subgroup 
described in the text of the paper. The target number of total genes chosen by each 
algorithm was 500. 
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The SOM analysis was performed using 30 X 18 node format to enable an 
optimal number of genes per node (-20 genes per node). Nodes that contained genes 
whose expression varied more than 2-fold from the mean in more than 70% of the 
samples in a particular subgroup were chosen. A total of 451 genes were chosen 
using the SOM algorithm and 443 genes using the DAV algorithm. The combined 
gene sets contained 755 unique genes, of which 185 were present in both subsets. 2-D 
hierarchical clustering of the genes and samples were performed using Pearson's 
correlation coefficient as the metric and unweighted pair group method using 
arithmetic averages (UPGMA). Approximately 10% of the genes that were found to 
have correlation coefficients less than 0.7 in each branch of the dendrogram were 
removed and the process was repeated reiteratively until the correlation coefficient for 
all genes within a branch was > 0.7, or until the removal of additional gene resulted in 
a deterioration of the class distinction as indicated by inappropriate clustering of 
cases. Through this approach a subset of 215 genes were selected that optimally 
separated the 7 subgroups. These genes are listed in Tables 30-36. The selection of 
genes by this approach does not provide for a ranking. For class prediction between 
20 and 30 genes were used for each genetic subgroup, unless otherwise stated. 



Table 30. Genes selected by DAV-SOM for BCR-ABL 





Affymetrix 
number 


Gene Name 


GeneSymbol 


Reference 
number 


Above/ 
Below 
Mean 


1 


39250_at 


nephroblastoma overexpressed gene 


NOV 


X96584 


Above 


2 


37600_at 


extracellular matrix protein 1 


ECM1. 


U68186 


Above 


3 


38312_at 


DKFZp5640222 from clone 
DKFZp5640222 




AL050002 


Above 


4 


38342_at 


KIAA0239 protein 


KIAA0239 


D87076 


Above 


5 


39712_at 


SI 00 calcium-binding protein A 13 


S100A13 


AI541308 


Above 


6 


39730_at 


v-abl Abelson murine leukemia viral 
oncogene homolog 1 


ABL1 


X16416 


Above 


7 


3978 l_at 


Insulin-like growth factor-binding protein 
4 


IGFBP4 


U20982 


Above 


8 


4005 l_at 


TRAM-like protein 


KIAA0057 


D31762 


Above 


9 


40504_at 


paraoxonase 2 


PON2 


AF001601 


Above 


10 


33362_at 


Cdc42 effector protein 3 


CEP3 


AF094521 


Above 


11 


33404_at 


adenylyl cyclase-associated protein 2 


CAP2 


U02390 


Above 


12 


34362_at 


solute carrier family 2 facilitated glucose 
transporter member 5 


SLC2A5 


M55531 


Above 


13 


36591_at 


Tubulin alpha 1 testis specific 


TUBA1 


X06956 


Above 
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14 38077_at 

15 40196_at 

16 1911_s_at 

17 1702_at 

18 1635 at 



19 1636_g_at 

20 1326_at 

21 330 s at 



collagen type VI alpha 3 
HYA22 protein 

Growth arrest and DNA-damage- 
inducible alpha 

interleukin 2 receptor alpha 
Human proto-oncogene tyrosine-protein 
kinase (ABL) gene, exon la and exons 2- 
10, complete cds. 

Human proto-oncogene tyrosine-protein 
kinase (ABL) gene, exon la and exons 2- 
10, complete cds. 

Caspase 10 apoptosis-related cysteine 
protease 

Tubulin, alpha 1, isoform44 



COL6A3 

HYA22 

GADD45A 

IL2RA 
ABL 



ABL 

CASP10 
TUBA1 



X52022 
D88153 
M60974 

X01057 
U07563 



U07563 



U60519 



HG2259- 
HT2348 



Table 31. Genes selected by DAV-SOM for E2A-PBX1 



Above 
Above 
Above 

Above 
Above 



Above 

Above 
Above 



Affymetrix 
number 


Gene Name 


GeneSymbol 


Reference 
number 


Above/ 
Below 








Mean 


1 33513__at 


signaling lymphocytic activation molecule 


SLAM 


U33017 


Above 


2 37479 at 


CD72 antigen 


CD72 


M54992 


Above 


3 37485_at 


fatty-acid-Coenzyme A ligase very long- 


FACVL1 


D88308 


Above 




chain 1 








4 39614 at 


KIAA0802 protein 


KIAA0802 


AB018345 


Above 


5 39929_at 


KIAA0922 protein 


KIAA0922 


AB023139 


Above 


6 40648_at 


c-mer proto-oncogene tyrosine kinase 


MERTK 


U08023 


Above 


7 41017_at 


Myosin-binding protein H 


MYBPH 


U27266 


Above 


8 41425_at 


Friend leukemia virus integration 1 


FLU 


M98833 


Above 


9 41862_at 


K1AA005 6 protein 


KIAA0056 


D29954 


Above 


10 32063_at 


pre-B-cell leukemia transcription factor 1 


PBX1 


M86546 


Above 


11 37225_at 


KIAA0172 protein 


KIAA0172 


D79994 


Above 


12 38285__at 


mu-crystallin gene 




AF039397 


Above 


13 38286_at 


KIAA1071 protein 


KIAA1071 


AB028994 


Above 


14 38340__at 


huntingtin interacting protein- 1 -related 


KIAA0655 


AB014555 


Above 


15 39379_at 


cDNA DKFZp586C1019 from clone 
DKFZp586C1019 




AL049397 


Above 


16 39402_at 


interleukin 1 beta 


IL1B 


M15330 


Above 


17 40454_at 


FAT tumor suppressor Drosophila homolog FAT 


X87241 


Above 


18 41139_at 


melanoma antigen family D 1 


MAGED1 


W26633 


Above 


19 41146_at 


ADP-ribosyltransferase NAD poly ADP- 
ribose polymerase 


ADPRT 


J03473 


Above 


20 33355„at 


Homo sapiens cDNA FLJ 12900 fis clone 
NT2RP2004321 




AL049381 


Above 


21 34783_s_at 


BUB3 budding uninhibited by 


BUB 3 


AF047473 


Above 




benzimidazoles 3 yeast homolog 
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22 36179_at 


mitogen-activated protein kinase-activated 
protein kinase 2 


MAPKAPK2 


TT1 f^T~lf\ 

U12779 


Above 


23 36589_at 


aldo-keto reductase family 1 member B 1 
aldose reductase 


AKR1B1 


X15414 


Above 


74 ^R^C* at 
al 


t'TA AH7zl7 npnp nmHiirt 


K1A A07A7 


D87434 


A.bove 


Z3 jo4jo 3.1 


i 

iNuciear iactor 01 Kappa ngnt poiypepuoe 
gene enhancer in B-cells 1 pi 05 




JLV1 JOOUj 


Above 


26 1786_at 


c-mer proto-oncogene tyrosine kinase 


TV yTT7T> TT-' 

MBRllv 


T 

UUo023 


Above 


77 1 c of 


lnterieuKin 1 oeta 


TT 1R 




Above 


1 707 a f 


/\X-/r^*-riDosyitransierase in/vl» poiy J\ur~ 
ribose polymerase 


J\Ur In. 1 




Above 


29 S54_at 


B lymphoid tyrosine kinase 


BLK 


SI 6611 


Above 


30 753_at 


Nidogen 2 


NID2 


D86425 


Above 


j 1 Hju al 


nucleoside phosphorylase 


NP 


X00737 


Above 


32 362_at 


Protein kinase C zeta 


PRKCZ 


Z15108 


Above 




Table 32. Genes selected by DAV/SOM for Hyperdiploid >50 




Affymetrix 
number 

1 'XfJlQ** at 


Gene Name 

prosaposin variant Gaucher disease and 
variant metachromatic leukodystrophy 


GeneSymbol 

PSAP 


Reference 
number 

J03077 


Above/ 
Below 
Mean 

Above 


7 ^8747 of 


B cell linker protein 


SLP65 


AF068180 


Above 


O jojIo at 


sex comb on midleg Drosophila like 2 


SCML2 


Y18004 


Above 




RAB9 member RAS oncogene family 


RAB9 


U44103 


Above 




KIAA0179 protein 


KIAA0179 


D80001 


Above 


6 33228_g_at 


interleukin 10 receptor beta 


IL10RB 


AI9S4234 


Above 


7 33753_at 


K1AA0666 protein 


KIAA0666 


AB014566 


Above 


5? 77^/17 s>t 

o j /j>*+j at 


Rac/Cdc42 guanine exchange factor GEF 6 ARHGEF6 


D25304 


Above 


0 7S0^R at 
at 


SH3-domain bhiding protein 5 BTK- 
associated 


SH3BP5 


AB005047 


Above 


i n 7on7Q c ot 


CGI-76 protein 


LOC51632 


AI557497 


Above 


11 39329_at 


Actinin alpha 1 


ACTN1 


X15804 


Above 


12 39389_at 


CD9 antigen p24 


CD9 


M38690 


Above 


1 7 "37707 ot 

1 3 3zzu /_at 


membrane protein palmitoylated 1 55kD 


MPP1 


M64925 


Above 


1 4 QOOIA ot 


ubiquitin-conjugating enzyme E2G 2 
homologous to yeast UBC7 


UBE2G2 


AF032456 


Above 


15 3225 l_at 


hypothetical protein FLJ21 174 


FLJ21174 


AA149307 


Above 


10 dd /o z t at 


chromosome X open reading frame 5 


OFD1 


Y15164 


Above 


17 36620_at 


superoxide dismutase 1 soluble 
amyotrophic lateral sclerosis 1 adult 


SOD1 


X02317 


Above 


18 36937_s_at 


PDZ and LEVI domain 1 elfin 


PDLEV11 


U90878 


Above 


19 37326_at 


proteolipid protein 2 colonic epithelium- 
enriched 


PLP2 


U93305 


Above 
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20 37350_at 



21 3S73S__at 

22 39168_at 

23 40903 at 



24 32572_at 

25 1065_at 

26 306 s at 



clone 889N1 5 on chromosome Xq22. 1- PSMD10 
22.3. Contains part of the gene for a novel 
protein similar to X. laevis Cortical 
Thymocyte Marker CTX 

SMT3 suppressor of mif two 3 yeast SMT3H1 
homolog 1 

Ac-like transposable element ALTE 
ATPase H transporting lysosomal vacuolar APT6M8-9 
proton pump membrane sector associated 
protein M8-9 

ubiquitin specific protease 9 X chromosome USP9X 
Drosophila fat facets related 



fms-related tyrosine kinase 3 

high-mobility group nonhistone 
chromosomal protein 14 



FLT3 
HMG14 



PCT/US03/084H6 
AL031177 Above 



X99584 

AB01S328 
AL049929 



X98296 

U02687 
J02621 



Above 

Above 
Above 



Above 

Above 
Above 



Affymetrix Gene Name 
number 

1 31492_at Muscle specific gene 

2 36777_at DNA segment on chromosome 12 unique 

2489 expressed sequence 

3 39301_at Calpain 3 p94 

4 41 448_at Homeo box A4 

5 39424_at tumor necrosis factor receptor superfamily 

member 14 herpesvirus entry mediator 

6 40076_at Tumor protein D52-like 2 

7 40493_at Human cell surface glycoprotein CD44 

(CD44) gene, 3* end of long tailed isoform. 

8 40506_s_at Homo sapiens polyadenylate binding 

protein mRNA, complete cds. 

9 405 1 4_at hypothetical 43 .2 Kd protein 

10 40763_at Meisl mouse homolog 

1 1 40797_at a disintegrin and metalloproteinase domain 

10 

12 4079S_s_at a disintegrin and metalloproteinase domain 

10 

13 41 747_s_at myocyte-specific enhancer factor 2 A 

(MEF2A) gene 

14 32193_at PlexinCl 

15 3221 5_i_at KIAA0878 protein 

16 33412_at LGALS1 Lectin, galactoside-binding, 

soluble, 1 (galectin 1) 

1 7 34306_at muscleblind Drosophila like 

18 34785_at KIAA1025 protein 



GeneSymbol 

M9 

D12S2489E 


Reference 
number 

AB019392 
AJ001687 


Above/ 
Below 
Mean 

Above 

Above 


CAPN3 
HOXA4 
TNFRSF14 


X85030 

ACUU4U5U 

U70321 


Below 
Above 
Below 


TPD52L2 
CD44 


AF004430 
L05424 


Above 
Above 




U75686 


Above 


LOC51614 
MEIS1 
ADAM 10 


AF091085 

U85707 

AF009615 


Above 
Above 
Above 


ADAM10 


Z48579 


Above 


MEF2A 


U49020 


Above 


PLXNC1 

KIAA0878 

LGALS1 


AF030339 
AB020685 
AI535946 


Above 
Above 
Above 


MBNL 
KIAA1025 


AB007888 
AB028948 


Above 
Above 
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1 0 at 
Ly JJZ70 al 


eukaryotic translation initiation factor 3 


EIF3S7 


U54558 


Above 




subunit 7 zeta 66/67kD 








20 36690_at 


Nuclear receptor subfamily 3 group C 


NR3C1 


M10901 


Above 




member 1 








21 37675_at 


solute carrier family 25 mitochondrial 






Above 




carrier phosphate carrier member 3 








22 38391„at 


cappmg protem actm filament gelsolm-like 


CAPG 


TV AClA T/1C 

JVly4343 


Above 


23 3S413_at 


defender against cell death 1 


DAD1 


D 15057 


Above 


24 39110_at 


eukaryotic translation initiation factor 4B 


EIF4B 


X55733 


Above 


25 39867_at 


Tu translation elongation factor 


TUFM 


S75463 


Above 




mitochondrial 






Above 


26 2062_at 


Insulin-like growth factor binding protein 7 


LKjr Jt>r / 




27 2036_s_at 


CD44 antigen homing function and Indian 


CD44 




Above 




blood group system 








28 1914_at 


Cyclin Al 


CCNA1 


U6683b 


Above 


29 1327_s_at 


mitogen-activated protein kinase kinase 


MAP3K5 


T TZT^7 1 C/Z 

U67156 


Above 




iS 1 lid -J 








30 1126_s_at 


Human cell surface glycoprotein CD44 


CD44 


L05424 


Above 




(CD44) gene, 3' end of long tailed isoform. 








31 1102_s_at 


Nuclear receptor subfamily 3 group C 


NR3C1 


M10901 


Above 




member 1 








32 873_at 


homeo box A5 






Above 


33 706_at 


Glucocorticoid receptor, beta 




HG4582- 


Above 






HT4987 




34 657_at 


protocadherin gamma subfamily C 3 


PCDHGC3 


LI 1373 


Above 




Table 34. Genes selected by DAV/SOM for Novel Class 




AiiymciriA 


Gene Name 


GeneSymbol 


Reference 


Above/ 


number 






number 


Below 










Mean 


1 33137_at 


latent transforming growth factor beta 


LTBP4 


Y 13622 


Above 




binding protein 4 








2 3808 l_at 


leukotriene A4 hydrolase 


LTA4H 


J03459 


Above 


3 38661_at 


seb4D 


HSRNASEB 


X75314 


Above 


4 39878_at 


protocadherin 9 


PCDH9 


AI524125 


Above 


5 35260_at 


KIAA0867 protein 


MONDOA 


AB020674 


Above 


6 1373_at 


transcription factor 3 E2A immunoglobulin 


TCF3 


M31523 


Above 




enhancer binding factors E12/E47 








7 35177_at 


KIAA0725 protein 


KIAA0725 


AB018268 


AUUVC 


8 38618_at 


Human PAC clone RP3-515N1 from 


L1MK2 


AC002073 


Above 




22qll.2-q22 








9 34947_at 


phorbolin-like protein MDS019 


MDS019 


AA442560 


Above 


10 40692_at 


transducin-like enhancer of split 4 homolog 


TLE4 


M99439 


Above 




of Drosophila E spl 








11 38364_at 


BCE-1 protein 


BCE-1 


AF068197 


Above 


12 37960_at 


carbohydrate chondroitin 6/keratan 


CHST2 


AB014679 


Above 




sulfotransferase 2 
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13 994_at 


Protein tyrosine phosphatase receptor type 

M . 

Protein tyrosine phosphatase receptor type 
M 

Protein tyrosine phosphatase receptor type 
M 

G protein-coupled receptor 49 


PTPRM 


X58288 


Above 


14 31892_at 






Above 


15 995_g_at 


PTPRM 


X58288 


Above 


16 41073_at 


GPR49 


AI743745 


Above 


17 41708_at 


KIAA1034 protein 


KIAA1034 


AB028957 


Above 


18 34376_at 


protein kinase cAMP-dependent catalytic 
inliibitor gamma 


PKIG 


AB019517 


Below 


19 37978_at 


quinolinate phosphoribosyltransferase 
nicotinate-nucleotide pyrophosphorylase 
carboxylating 


QPRT 


D7S177 


Below 


20 38717_at 


DKFZP5S6A0522 protein 


DKFZP586A05 AL050159 


Below 


21 33999_f^at 


Human L2-9 transcript of unrearranged 
immunoglobulin V H 5 pseudogene 




X58398 


Above 


22 36181_at 


LIM and SH3 protein 1 


LASP1 


X82456 


Below 


23 41202_s_at 


conserved gene amplified m osteosarcoma 


OS4 


AF000152 


Above 


24 41138_at 


Antigen identified by monoclonal 
antibodies 1 ?F7 F21 and Ol^ 


MIC2 


Ml 6279 


Below 


25 4077 l_at 


Moesin 


MSN 


Z98946 


Above 


26 39070_at 


singed Drosophila like sea urchin fascin 
homolog like 


SNL 


U03057 


Below 


27 32562_at 


endoglin Osler-Rendu- Weber syndrome 1 


ENG 


X72012 


Below 


28 36536_at 


schwannornin interacting protein 1 


SCHIP-1 


AF070614 


Below 


29 36650_at 


cyclin D2 


CCND2 


D13639 


Below 


30 39756_g_at 


X-box binding protein 1 


XBP1 


Z93930 


Above 


31 34168_at 


deoxynucleotidyltransferase tenninal 


DNTT 


Ml 1722 


Above 


32 1389_at 

33 41213_at 


membrane metallo-endopeptidase neutral 
endopeptidase enkephalinase CALLA 
CD10 

peroxiredoxin 1 


MME 
PRDX1 


J03779 
X67951 


Below 
Above 


34 36571_at 


Topoisomerase DNA H beta 180kD 


TOP2B 


X68060 


Above 


35 253_g_at 


clone GPCR W G protein-linked receptor 
gene (GPCR) gene, 5' end of cds. 




L42324 


Below 


36 252_at 


clone GPCR W G protein-linked receptor 
gene (GPCR) gene, 5' end of cds. 




L42324 


Above 


37 2087_s_at 


cadherin 1 1 type 2 OB-cadherin osteoblast 


CDH11 


D21254 


Above 


38 36976_at 


cadherin 1 1 type 2 OB-cadherin osteoblast 


CDH11 


D21255 


Above 



Affymetrix 
number 

1 35016_at 

2 36277 at 



Table 35. Genes selected by DAV/SOM for T-ALL 

Gene Name GeneSymbol Reference 

number 

M13560 



Human la-associated invariant gamma- 
chain gene, exon 8, clones lambda-y(l,2,3). 

membrane protein (CD3-epsilon) gene 
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CD3E 



M23323 



Above/ 
Below 
Mean 
Below 

Above 
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3 38147_at 


SH2 domain protein 1 A Duncan s disease 


SH2D1A 


AL023657 


Above 




lymphoproliferative syndrome 








4 38949_at 


protein kinase C theta 


PRKCQ 


L01087 


Above 


5 32649_at 


transcription factor 7 T-cell specific HMG- 
box 


TCF7 


X59871 


Above 


6 3323S_at 


Human T-lymphocyte specific protein 


LCK 


U2ib5z 


Above 




tyrosine kinase p561ck (LCK) aberrant 
mRNA, complete cds. 








7 35643_at 


nucleobindin 2 


NUCB2 


X76732 


Above 


8 36473_at 


ubiquitin specific protease 20 


USP20 




Above 


9 38319_at 


CD3D antigen delta polypeptide TiT3 

rnmnlpY 


CD3D 


AA9191Uz 


Above 


10 39709_at 


selenoprotein W 1 


SEPW1 


U67171 


Above 


11 40775_at 


integral membrane protein 2A 


ITM2A 


AL021786 


Above 


12 32794_g_at 


T cell receptor beta locus 


TRB 


vnA/i in 


Above 


13 37039_at 


major histocompatibility complex class II 
DR alpha 


HLA-DRA 


J00194 


Below 


14 38051_at 


mal T-cell differentiation protein 


MAL 


X76220 


Above 


15 38095__i_at 


major histocompatibility complex class II 
DP beta 1 


HLA-DPB1 


M83664 


Below 


16 38096_f_at 


major histocompatibility complex class II 
DP beta 1 


HLA-DPB 1 


M83664 


Below 


17 38415_at 


protein tyrosine phosphatase type IVA 
member 2 


PTP4A2 


U 14603 


Above 


18 38833_at 


Human mRNA for SB classll 




X00457 


Below 




histocompatibility antigen alpha-chain 








19 2059_s_at 


lymphocyte-specific protein tyrosine kinase LCK 


JYLSobol 


Above 


20 1241_at 


protein tyrosine phosphatase type IVA 
member 2 


PTP4A2 


U14603 


Above 


21 1105_js_at 


T cell receptor beta locus 


TRB 


M12886 


Above 



Table 36: Genes selected by DAV/SOM for TEL-AML1 



Affymetrix Gene Name 
number 



GeneSymbol 



1 3 1 508__at upregulated by 1 , 25-dihydroxyvitamin D-3 VDUP 1 

2 33690_at cDNA DKFZp434A202 from clone 

DKFZp434A202 

3 3448 l_at vav proto-oncogene, exon 27, and complete VAV 

cds. 

4 36239_at POU domain class 2 associating factor 1 POU2AF1 

5 37470_at Leukocyte-associated Ig-like receptor 1 LAIR1 

6 38203_at Potassium intermediate/small conductance KCNN1 

calcium-activated channel subfamily N 
member 1 



Reference 
number 

S73591 
AL080190 

AF030227 

Z49194 

AF013249 

U69883 



Above/ 
Below 
Mean 

Above 

Above 

Above 

Above 
Above 
Above 
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7 38570_at major histocompatibility complex class II HLA-DOB 

DO beta 

8 38578_at tumor necrosis factor receptor superfamily TNFRSF7 

member 7 

9 38906_at spectrin alpha erythrocytic 1 elliptocytosis SPTA1 

2 

1 0 40729_s_jat nuclear factor of kappa light polypeptide NFKBIL1 

gene enhancer in B-cells inhibitor-like 1 

1 1 40745_at adaptor-related protein complex 1 beta 1 AP1B1 

subunit 

12 41 097__at telomeric repeat binding factor 2 TERF2 

13 41381_at KIAA0308 protein KIAA0308 

14 41442_at core-binding factor runt domain alpha CBFA2T3 

subunit 2 translocated to 3 

15 31 S98_at KIAA02 1 2 gene product KIAA02 1 2 

16 32660_at KIAA0342 gene product KIAA0342 

17 34194_at cDNA FLJ21697 fis clone COL09740 

18 35614_at transcription factor-like 5 basic helix-loop- TCFL5 

helix 

19 35665_at Phosphoinositide-3 -kinase class 3 PIK3C3 

20 36008_at protein tyrosine phosphatase type IVA PTP4A3 

member 3 

21 36524_at Rho guanine nucleotide exchange factor ARHGEF4 

GEF 4 

22 36537_at Rho-specific guanine nucleotide exchange PI 14-RHO- 

factorpll4 GEF 

23 37280_at MAD mothers against decapentaplegic MADH1 

Drosophila homolog 1 

24 38652_at hypothetical protein FLJ20154 FLJ20154 

25 41200_at CD36 antigen collagen type I receptor CD36L1 

thrombospondin receptor like 1 

26 32224_at KIAA0769 gene product KIAA0769 

27 36985_at isopentenyl-diphosphate delta isomerase IDI1 

28 38124_at midkine neuiite growth-promoting factor 2 MDK 

29 39824_at ESTs 

3 0 405 70_at forkhead box 0 1 A rhabdomyosarcoma FOXO 1 A 

3 1 41498_at KIAA091 1 protein KIAA091 1 

32 41814_at fucosidase alpha-L- 1 tissue FUCA1 

33 32579_at SWI/SNF related matrix associated actin SMARCA4 

dependent regulator of chromatin subfamily 
a member 4 

34 331 62_at insulin receptor INSR 

35 1779_s_at pim-1 oncogene PIM1 

36 1488_at protein tyrosine phosphatase receptor type PTPRK 



X03066 

M63928 

M61877 
Y14768 

L13939 

AF002999 
AB002306 
AB010419 

D86967 
AB002340 
AL049313 
AB012124 

Z46973 
AF041434 

AB029035 

AB011093 

U59912 

AF070644 
Z22555 

AB018312 

X17025 

X55110 

AI391564 

AF032885 

AB020718 

M29877 

D26156 



X02160 
M16750 
L77886 



Above 

Above 

Above 
Above 

Above 

Above 
Above 
Above 

Above 
Above 
Above 
Above 

Above 
Above 

Above 

Above 

Above 

Above 
Above 

Above 
Above 
Above 
Above 
Above 
Above 
Above 
Above 



Above 
Above 
Above 



K 
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37 


1325_at 


MAD mothers against decapentaplegic 
Drosophila homolog 1 




T T^QAOI 


Above 


38 


1336_s_at 


protein kinase C beta 1 


T)T> T- /^TJ 1 


AUO^ 1 o 


Above 


39 


1 O C\C\ ~.4- 

i2yy_at 


Telomeric repeat binding factor 2 


TT7T> T70 
1 ZiKJ H Z 




Above 


40 


121 /_g_at 


protein kinase C beta 1 




VA7 1 HQ 
AU / lUi* 


Above 


A 1 

41 


1 u / /_at 


recombination activating gene 1 


iVrVVjl 


K/19Q474 


rvuu v *^ 


A O 


yjz__i_at 


zinc linger protein v i tirr / ri 1 r i u 


TTSJTTQI 
Zjisry 1 


Til 


Above 


43 


880_at 


FK506-binding protein 1A 12kL> 


risiJr 1A 


M343jy 


Above 


44 


755_at 


inositol 1 4 5 -triphosphate receptor type 1 


ITPR1 


D26070 


Above 


45 


577__at 


midkine neurite growth-promoting factor 2 


MDK 


M94250 


Above 


46 


160029_at 


protein kinase C beta 1 


PRKCB1 


X07109 


Above 



C. Comparison of genes selected by the different metrics . 

There is a high degree of overlap between the genes chosen by the various 
metrics, however the top ranked genes for each metric differ. Despite this, the top 
genes selected by the various metrics are all able to accurately identify the leukemia 
risk groups as detailed below. As a result, a limited number of genes can be used to 
accurately identify the genetic subtypes and one can use non-overlapping lists-antTstill 
achieve high prediction accuracy. Thus, there are many genes that are distinct 
discriminators of these seven risk groups, and one need only to use a small subset of 
these in a supervised learning algorithm to accurately identify a case as belonging to 
the genetic subtype. 

D. Decision tree for the diagnosis of genetic subtypes 

Classification was approached using a decision tree format, in which the first 
decision was T-ALL versus B-lineage (non-T-ALL). Within the B-lineage subset, 
cases were then sequentially classified into the known risk groups characterized by 
the presence of E2A-PBX1, TEL-AML1, BCR-ABL, MLL chimeric genes, and lastly 
hyperdiploid >50 chromosomes. Cases not assigned to one of these classes were left 
unassigned. Classification was performed using the supervised learning algorithms 
described below. 

E. Description of Supervised Learning Algorithms 

An analysis of the profiles was performed using alinear classifier, C4.5, and a variety 
of different non-linear classifiers. The non-linear classifiers consistently outperformed 
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the linear classifier. Therefore, only the description and data from non-linear 
classifiers are included below. 

1 . Support Vector Machine (S VM) 

5 Support vector machine (SVM) selects a small number of critical boundary 

instances from each class and builds a linear discriminant function that separates them 
as widely as possible (Witten and Frank, Data Mining: Practical Machine Learning 
Tools and Techniques with Java Implementation, Morgan Kaufmann, 1999, herein 
incorporated by reference). In the case where no linear separation is possible, the 

10 technique of "kernel" is used to automatically inject the training instances into a 

higher dimensional space and a separator is learned in that space. The Weka version 
of SVM developed at the University of Waikato of New Zealand 
(www.cs.waikato.ac.nz/ml/weka), which implements Piatt's sequence minimal 
optimization algorithm for training a support vector classifier using polynomial 

15 kernels was used (Piatt, "Fast Training of Support Vector Machines Using Sequential 
Minimal Optimization," Advances in Kernel Methods — Support Vector Learning, 
Schlkpof et ah, eds., MIT Press, 1998, herein incorporated by reference). 

2. Prediction by Collective Likelihood of Emerging Patterns (PCL) 
20 Emerging patterns (EPs) are a notion used in data mining to discover sharp 

differences between two classes of data (Dong and Li, "Efficient Mining of Emerging 
Patterns: Discovering Trends and Differences," Proc. 5th ACM SIGKDD 
International Conference on Knowledge Discovery and Data Mining, pp. 43-52 
(1999), herein incorporated by reference). An EP is a pattern — the expression level of 

25 several genes in our case — whose frequency increases significantly from one class of 
samples to another class, hi particular, the most general patterns that have infinite 
growth in the sense that their frequency in one class is 0% and in another class is 
greater than 0% and none of their proper subpatterns are EPs were identified. These 
EPs can then be combined into reliable rules for subtype prediction. Three earlier 

30 methods for classification based on EPs are JEP(Li et aL (2001) Knowledge and 
Information System 3:131-45, herein incorporated by reference), DeEPs (Li et ah, 
"DeEPs: Instance-based Classification by Emerging Patterns," Proc. 4th European 
Conference on Principles and Practice of Knowledge Discoveiy in Databases, pp. 
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191-200, 2000, herein incorporated by reference), and CAEP (Dong et aL, "CAEP: 
Classification by Aggregation Emerging Patterns," Proc. 2nd International 
Conference on Discoveiy Science, pages 30-42, 1999, herein incorporated by 
reference). 

5 In this analysis an original variation in the spirit of JEP but with a different 

manner of aggregating EPs was used. Given two training data sets D p and D n and a 
testing sample T, the first phase was to discover EPs from D p and D n . Denote the EPs 
of Dp, in descending order of frequency, as TopEP p i, . . ., TopEP P i, and those of D n as 
TopEP n i, . . ., TopEP" . Suppose T contains the following EPs of D p : TopEP p iy , . . ., 

10 TopEP P i*, where il < \2 < . . . < be <= i; and the following EPs of D n : TopEP" ;, . . ., 

TopEP n jy5 where j 1 < }2 < ... < jy <= j. In the next step, two scores were calculated for 
T: scorep = Z[frequency(TopEP P i„,)/frequency(TopEP p m )] and scoren = 
S[frequency(TopEP n jm )/frequency(TopEP n m )], summing over m = L.k, where k « i 
and k « j. In this case, k is chosen to be 25. Finally, a prediction is made on T as 

1 5 follows: If score p > score n , then T is predicted to be in class D p ; otherwise, it is 
predicted as class D n . 

The spirit of this variation is to measure how far the top k EPs contained in T 
are away from the top k EPs of a class. For example, if k = 1, then scorep indicates 
whether the number-one EP contained in T is far from the most frequent EP of D p . If 

20 the score is the maximum value 1, then the "distance" is very close, namely the most 
common property of D p is also present in this testing sample. With smaller scores, the 
distance becomes further and the likelihood of T belonging to D p becomes weaker. 
Using more than one top-ranked EPs in this way leads to very reliable predictions. 
This variation of EP-based classification method was termed "prediction by 

25 collective likelihood of EPs" or PCL for short. 

3. ^-Nearest Neighbor (fc-NN) 

&-NN is a typical instance-based learner where the class of a new instance is 
decided by the majority class of its k closest neighbors (Cover and Hart (1967) IEEE 
30 Transactions on Information TJteory 13:21 -27, herein incorporated by reference). 

This method was used with the Euclidean distance metric. Conceptually, this is one 
of the most straightforward methods and is often used as a baseline for comparison 
purposes. The data were normalized using the z-score method, then the "best" few 
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genes were chosen using one of the statistical gene selection methods. For these 
experiments, the "top //" genes, where n= 1-50, were used. The expression values of 
the top genes from each diagnostic sample were treated as a vector in //-dimensional , 
space. To classify a new sample, the same top n genes were chosen, and the 
5 Euclidean distance was computed between this new vector and each vector in the 
training data. The prediction was made by a majority vote of the k nearest, samples, 
where k=\ or &=3. In this experiment, k was set to 1 . 

4. Artificial Neural Network (ANN) 

10 The artificial neural network (ANN) learning models built are all feed- 

forward, fully connected, and non-recurrent. The input layer of each ANN contains 
50 units, which correspond to the 50 input values (the "top 50" scoring genes). Each 
ANN has one hidden layer with 4 units, and an output layer that contains two units, 
which represent the two class labels. In a preprocessing step all input data was 

15 normalized using the z-score method. The apparent error was estimated using 3-fold 
cross-validation. That is, for each training procedure, the training samples were 
randomly shuffled and divided into three groups of approximately equal size. A 
model was built with two of the groups and the third group was set aside for 
validation. This step was repeated three times, each time with a different group for 

20 validation. This shuffling-training process was repeated ten times, resulting in 30 
ANN models. Each test sample was fed into each of the 30 ANN models, and the 
output was the average of the 30 outputs. The class predicted was the one that was 
represented by the output unit with the larger average output value. 

25 F. Table of results using the different algorithms to predict the genetic subgroups 
A summary of the true prediction accuracy on the blinded test set of 1 12 cases 
are presented in Tables 37-39. Sensitivity was calculated as the number of positive 
samples predicted /the number of true positives. Specificity was calculated as the 
number of negative samples predicted/the number of true negatives. 

30 



-101- 



BNSDOCJD: <WO 030831 40A2_I_> 



WO 03/083140 



PCT/US03/08486 



Table 37. True Prediction Accuracy Results 

on Test Set using SVM and ANN algorithms 

SVM " ANN 



ChiSq CFS T-stats SOM/DAV Wilkins' 



T-ALL 


True Accuracy 


100 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


100 


E2A-PBX1 


True Accuracy 


100 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


100 


TEL-AML1 


True Accuracy 


99 


99 


98 


97 


100 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


98 


98 


97 


97 


100 


BCR-ABL 


True Accuracy 


95 


97 


94 


97 


97 




Sensitivity 


50 


67 


33 


83 


83 




Specificity 


100 


100 


100 


98 


98 


MLL 


True Accuracy 


100 


98 


100 


97 


100 




Sensitivity 


100 


100 


100 


86 


100 




Specificity 


100 


98 


100 


100 


100 


H>50 


True Accuracy 


96 


96 


96 


95 


94 




Sensitivity 


100 


100 


100 


95 


100 




Specificity 


93 


93 


93 


93 


89 



Table 38, True Prediction Accuracy Results on Test Set using /c-NN 











A-NN 








Chi Sq 


CFS 


T-stats 


Wilkins' 


T-ALL 


True Accuracy 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


E2A-PBX1 


True Accuracy 


100 


100 


100 


100 




Sensitivity 


100 


100 


100 


100 




Specificity 


100 


100 


100 


100 


TEL-AML1 


True Accuracy 


98 


98 


99 


100 




Sensitivity 


100 


96 


96 


100 




Specificity 


97 


98 


100 


100 


BCR-ABL 


True Accuracy 


94 


97 


95 


93 




Sensitivity 


33 


67 


50 


67 




Specificity 


100 


100 


100 


96 


MLL 


True Accuracy 


100 


98 


95 


100 




Sensitivity 


100 


83 


100 


100 




Specificity 


100 


100 


94 


100 


H>50 


True Accuracy 


98 


96 


94 


98 




Sensitivity 


100 


100 


95 


100 




Specificity 


96 


93 


93 


96 
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Table 39. True Prediction Accuracy Results on Test Set using PCL 









PCL 








Chi Sq 




CFS 


T-ALL 


True Accuracv 


100 




100 




ocnbmviiy 






100 




Specificity 


100 




100 


E^A-PBXl 


True Accuracv 


ND 




100 




VArtClfll »|fi7 

ociisiuviiy 






100 




Specificity 


ND 




100 




True Accuracv 


99 




ND 




o ensru vriy 


y\j 




NT) 




Specificity 


100 




ND 


BCR-ABL 


True Accuracy 


97 




ND 




Sensitivity 


67 




ND 




Specificity 


100 




ND 


MLL 


True Accuracy 


100 




ND 




Sensitivity 


100 




ND 




Specificity 


100 




ND 


H>50 


True Accuracy 


98 




ND 




Sensitivity 


100 




ND 




Specificity 


96 




ND 



The assignment of a leukemic sample to a specific biologic subgroup is more 
accurately reflected by its gene expression profile than by the presence or absence of a 
specific genetic lesion. For example, four patients that had expression profiles 
classified as TEL-AML1, despite lacking a TEL-AML1 chimeric message by the 
reverse transcriptase polymerase chain reaction (RT-PCR) were found to have an 
alteration in TEL, suggesting a common underlying biology. Thus, from a technical 
viewpoint, gene expression profiling provides a viable alternative to standard 
diagnostic approaches. 

G. Absence of correlation of expression data for genetic subtypes with stage of B- 
cell differentiation 

The expression profiles of the different risk groups of B-cell leukemias do 
15 notcorrespond to markers of different stages of B-cell differentiation,. The first issue 
is defining the stage of B-cell differentiation. The defined stages of BM derived B- 
cells relevant to pediatric ALL are outlined below in Table 40, along with their 
frequency in pediatric ALL (Campana and Behm (2000)7! Immunologic Methods, 
243:59-75). Three stages of differentiation are defined by a limited number of 
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10 



15 



markers. In Table 41 below, the distribution of the leukemia cases into these B-cell 
differentiation stages is shown. As can be seen, none of the genetic subtypes is 
specifically associated with one of these three stages of differentiation. Thus, this 
simple analysis clearly shows that the majority of the chromosomal translocation 
subgroups in pediatric ALL do not correspond to a specific stage of B-celL 
differentiation. Tins is a well-known fact in the field of pediatric ALL and differs 
from the relationship typically seen between chromosomal translocations and other 
genetic lesions, and the stage of differentiation seen in B-cell lymphomas. 



Subtype 




Leukocyte antigen expression 


Frequency 






(% of cases positive) 


<%) 




CD19 


CD22 clg|n slgju. slg k or A, 




Early Pre-B 


100 


>95 0 0 0 


60-65 


Pre-B 


100 


100 100 0 0 


20-25 


Transitional 


100 


100 100 100 0 


1-3 



Abbreviations: clg |H ? cytoplasmic immunoglobulin \i chain; slg \x } surface immunoglobulin \x chain; 
slg k or X, surface immunoglobulin k or X chains 

a D.Campana and F.G.Behm, "Immunophenotyping of leukemia", Journal of Immunological Methods 
243: 59-75, 2000. 

Table 41. Distribution of genetic subtypes by immunophenotype 3 





EARLY PRE-B 


PRE-B 


TRANSITIONAL 
PREB 


E2A 


0 


17 


6 


TEL 


55 


23 


0 


BCR 


11 


3 


0 


MLL 


12 


6 


1 


Hyperdip>50 


49 


9 


5 


Novel 


8 


4 


1 


Total 


172 


77 


24 



a For this analysis, samples with other immunophenotypes (NOS or mature B-cell) were not included 

20 The next goal was to determine whether a set of genes that could accurately 

identify subjectss by their stage of differentiation, regardless of leukemai risk group. 
To accomplish this, cases were assigned into one of three classes, early pre-B, pre-B, 
or transitional pre-B based on their immunophenotype. The top 50 genes that 
distinguished each group from the other two groups were selected using the Wilkins' 

25 metric. These genes were then used in an ANN analysis to assess their performance 
in correctly classifying the 273 diagnostic B-lineage ALL samples, for which a stage 
of differentiation could be determined, through a process of cross validation. The 
results of this analysis are included below. 
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Table 42. Accuracy Results for immunophenotype discrimination using 

Wiikins' metric and ANN algorithm 

Accuracy Sensitivity Specificity 
Early Pre-B a 78.39% 85.47% 66.34% 

Pre-B b 71.79% 38.96% 84.69% 

Transitional Pre-B c 91.24% 33.33% 96.79% 

a Cells with CD19+, CD22+, cytoplasmic Ig^-, surface Igjn- immunophenotype 
5 b CeIls with CD19+, CD22+, cytoplasmic Igji+, surface Igji- immunophenotype 

c Cells with CD19+, CD22+, cytoplasmic Ig^+, surface Ig^i+ immunophenotype 

The selected genes perform rather poorly in correctly assigning cases to specific B- 
cell differentiation stages, with accuracies well below those achieved for prediction of 

10 the genetic subgroups. When these genes are used in a two-dimensional hierarchical 
clustering algorithm they failed to cluster cases by immunophenotype, but instead, 
resulted in the loose clustering of some of the genetic subgroups, including E2A- 
PBX1, TEL-AML1, BCR-ABL, MLL, and hyperdiploid >50. The analysis was 
repeated using genes selected by DAV and again, no clustering of the 

15 immunophenotypically-defined stages was observed. Thus, it was not possible to 
identify expression profiles that can accurately identify the inmiunophenotypically- 
defined differentiation stages of pediatric B-cell ALL. Moreover, the expression 
profiles that were defined for the genetic subtypes are not profiles that correspond to 
specific stages of B-cell differentiation. Although some of the genes that define 

20 specific genetic subtypes can be associated with a particular stage of B-cell 

differentiation, the majority of the discriminating genes show no correlation with 
differentiation. 



H. Results for relapse prediction 

25 In the prediction of whether a patient would go into continuous complete 

remission or would relapse, a subtype-specific approach was adopted. An individual 
classifier was constructed for each subtype of ALL. Given a sample, the subtype was 
first predicted, and then the corresponding subtype-specific prognostic classifier was 
invoked to predict whether the patient would relapse. This subtype-specific approach 

30 was required because an expression profile predictive of relapse for the entire group 
could not be defined. 

In the construction of the type-specific classifiers, genes were selected by CFS 
unless this algorithm returned >20 genes, in which case the top 20 ranked genes by T- 
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statistics were used. When the T-statistics method was used, the selection of how 
many among the top 20 T-statistics genes were to be used was made by performing 
cross validation experiments — that is, the top n genes for n = 1 ..20 were picked the n 
that gave the best cross validation results was selected. The cross validation results 
5 for the optimal choice of genes are summarized in Table 43 below. The genes that 
were chosen for use in subtype-specific relapse predictions are summarized in Table 
44. 

Table 43. Results of relapse prediction on indicated subgroups 

P value by 





Relapse 


CCR 


# genes 


metric 


Accuracy 


permutation test 


T-ALL 


8 


26 


7 


t-stats 


97 


0.034 


H>50 


5 


43 


13 


t-stats 


100 


0.018 


TEL-AML1 


3 


56 


7 


CFS 


100 


0.145 


MLL 


5 


7 


4 


t-stats 


100 


0.104 


Others 


4 


56 


20 


t-stats 


98.3 


0.079 



Table 44. Genes selected by T-statistics/CFS for relapse (T-ALL) 

Gene Name GeneSymbol Reference Above/ 

Number Below 
Mean 

Human TBXAS 1 gene for thromboxane synthase TBXAS 1 D34625 Above 



Homo sapiens mRNA for 41-kDa 
phosphoribosylpyrophosphate synthetase- 
associated protein 

Human DNA sequence from PAG 370M22 
Human spinal muscular atrophy gene 
Human cell surface glycoprotein CD44 
Human mRNA for KIAA0056 gene 
Human BTK region clone ftp-3 mRNA 



AB007851 Above 

ZS2206 Above 

SMA5 X83301 Above 

CD44 L05424 Above 

K1AA0056 D29954 Above 

U01923 Above 



Table 45, Genes Selected by T statistics/CFS for relapse Hyperdiploid > 50 





Affymetrix 
number 


Gene Name 


Gene Symbol 


Reference 
Number 


Above/ 
Below 
Mean 


1 


37721_at 


deoxyhypusine synthase 


DHPS 


U79262 


Above 


2 


38721_at 


KIAA1536 protein 


KIAA1536 


W72733 


Above 


3 


40120_at 


hydroxyacyl glutathione 


HAGH 


X90999 


Above 






hydrolase 






Above 


4 


41386_i_at 


KIAA0346 protein 


KIAA0346 


AB002344 
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7 
S 



13 
14 
15 

16 
17 
IS 



38677_at 
37620_at 

34703_fat 
38355 at 



9 41214_at 

10 34530_at 

1 1 603_at 

12 32697 at 



41129_at 
33333_at 
37078_at 

38148_at 
39150_at 
33869 at 



19 41447_at 

20 39369 at 



stress 70 protein chaperone 
miorosome-associated 60kD 

Human TFIID subunits TAF20 
and TAF15 mRNA, complete 
cds. 
EST 

DEAD/H Asp-Glu-Ala-Asp/His 
box polypeptide Y chromosome 

ribosomal protein S4 Y-linked 

Homo sapiens cDNA FLJ22448 
fis clone HRC09541 

nuclear receptor subfamily 2 
group C member 1 

inositol myo 1 or 4 
monophosphatase 1 
KIAA0033 protein 

KIAA0403 protein 

CD3Z antigen zeta polypeptide 

TiT3 complex 

cryptochrome 1 photolyase-like 

ring finger protein 1 1 

DKFZp586N1323 from clone 
DKFZp586N1323 

KIAA0990 protein 
KIAA0935 protein 



STCH 



DBY 
RPS4Y 

NR2C1 

IMPA1 

KIAA0033 
KIAA0403 
CD3Z 

CRY1 
RNF11 



KIAA0990 
KIAA0935 



U04735 
U57693 

AA151971 
AF000984 

M58459 
W73822 

M29960 

AF042729 

D26067 

AB007863 

J04132 

D83702 
U69559 
AL0S0218 

AB023207 
AB023152 



Above 
Above 

Above 
Above 

Above 
Above 

Above 

Above 

Above 
Above 
Above 

Above 
Above 
Above 

Above 
Above 



Table 46: Genes selected by T-statistics/CFS for relapse (TEL-AML1I) 



Affymetrix 
number 



Gene Name 



1 35797_at Human interleukin-13 gene 

2 37524_at Human death-associated protein kinase 

3 34243_i_at Human 1(3 )mbt protein homolog mRN A 

4 41398_at Homo sapiens mRNA. CDNA 

DKFZp564A186 

5 35 195_at H. sapiens mRNA for phosphate cyclase 

6 32393_s__at Homo sapiens cDNA 

7 3 1 909__at Homo sapiens mRNA for KIAA0754 

protein 



Gene 
Symbol 

IL-13Ra 
DRAK2 



KIAA0754 



Reference 
number 

Y10659 
AB011421 
U89358 
AL049305 

Y11651 

W27466 

AB018297 



Above/ 
Below 
Mean 

Above 

Above 
Above 
Above 

Above 
Above 
Above 
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Table 47: Genes selected by T-statistics/CFS for relapse (MLL) 



Affymetrix Gene Name 
number 



Gene 
Symbol 



Reference 
number 



1 294 s at 



2 
3 
4 



38226_at 
139S_g_at 
409 at 



Protein Kinase Pitslre, Alpha, Alt. Splice 1- 
Feb 



23hl 1 Homo sapiens cDNA W27152 

Human protein kinase (MLK-3) rnRNA HUMMLK3A L32976 

Human mRNA for 14.3.3 protein, a protein X56468 
kinase regulator 



Above/ 
Below 
Mean 
Below 



Below 
Above 
Below 



Table 48: Genes selected by T-statistics/CFS for relapse (Others) 



Affymetrix 
number 

1 33782_r_at 

2 33338_at 

3 40242_at 

4 37018_at . 

5 38337_at 

6 41464_at 

7 38064_at 
S 33173_g__at 

9 33365_at 

10 39367_at 

11 41108_at 

12 37304_at 

13 40359_at 

14 32792 at 



15 34726_at 

16 40299 at 



Gene Name 



mi82f03.sl Homo sapiens cDNA, 3 end 
/clone=IMAGE-1090397 

Human transcription factor ISGF-3 mRNA 

Human (clone N5-4) protein p84 mRNA 

qd05c04.xl Homo sapiens cDNA, 3 end 
/clone=IMAGE-l 722S22 

Homo sapiens zinc finger protein mRNA 

Human mRNA for KIAA0339 gene 

H. sapiens lrp mRNA 

yc89b05.rl Homo sapiens cDNA, 5 end 
/clone=IMAGE-23231 

Homo sapiens mRNA for KIAA0945 
protein 

ni3Se08.sl Homo sapiens cDNA, 3 end 
/clone=IMAGE-979 1 42 

Homo sapiens mRNA for putative GTP- 
binding protein 



Homo sapiens heterochromatin protein p25 P25beta 
mRNA 



Human DNA-binding protein (HRC1) 
mRNA 

Human DNA sequence from clone 465N24 
on chromosome lp35. 1-36.13. Contains 
two novel genes, ESTs, GSSs and CpG 
islands 

Human voltage-gated calcium channel beta 
subunit mRNA 

Homo sapiens G-protein coupled receptor 
RE2 mRNA, 



GeneSymbol Reference Above/ 
number Below 
Mean 

AA5S7372 Above 



M97936 Above 

L36529 Above 

AH 89287 Above 

U62392 Above 

AB002337 Above 

X79882 Above 

T75292 Below 

AB023162 Above 

AA522537 Above 

Y14391 Above 

U35451 Below 

M91083 Above 

AL031432 Above 

U07139 Above 

AF091890 Above 



KIAA0339 
LRP 



KIAA0945 



PGPL 



HRC1 
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17 40704__at 


H.sapiens mRNA for phosphatidylinositol 


Z29090 


Above 




3 -kinase 






18 3S568_at 


Homo sapiens p53 binding protein mRNA 


U82939 


Above 


19 32038_s_at 


wi30cl2.xl Homo sapiens cDNA, 3 end 


AI739308 


Above 




/clone=IMAGE-23 91766 






20 39613_at 


H.sapiens HUMM9 mRNA 


X74837 


Above 



I. Permutations test results 

As the number of relapse samples were small, in addition to the usual cross validation 
5 experiments, 1000 permutation experiments were performed for each subtype-specific 
relapse study. In each permutation experiment, the samples were re-partitioned in a 
manner that preserved class size by randomly swapping the class labels ("relapse" or 
"continuous complete remission"). The same metric was then employed to pick the 
same number of genes as in the original partitioning of the samples given by the 

10 original class labels. SVM was then used to obtain a prediction accuracy by cross 
validation for this random partition using these freshly selected genes. The 
percentage of these 1000 permutation experiments was taken as a p- value that gave an 
indication on how many random partitions of the original samples could achieve the 
same accuracy as the original samples. The results of these permutation experiments 

15 are summarized in the last column of Table 43 above. These results show that the 
high accuracy obtained on the predictability of relapse in T-lineage ALL, 
Hyperdiploid>50, and others are unlikely to be a random event. The higher p-values 
obtained for the subtypes of TEL-M4L1 and MLL are probably due to the small 
number of relapse samples available for analysis. 

20 

Table 49. Permutation test results for predictors of T- ALL relapse 





Affymetrix 


t-statistic 






neighbors 


Rank 


number 


value 


Perml% 


Perm 5% 


1 


33777_at 


7.8337 


7.3774 


5.4783 


6 


2 


41853_at 


6.1727 


6.5948 


4.8117 


16 


3 


38866_at 


5.9890 


6.0293 


4.5611 


12 


4 


41643_at 


5.6106 


5.6815 


4.3877 


12 


5 


1 126_s_at 


5.4777 


5.5162 


4.2375 


11 


6 


41862_at 


5.3734 


5.3759 


4.120S 


11 


7 


41131_f_at 


4.9134 


5.2280 


4.0295 


17 
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Table 50. Permutation test results for predictors of Hyperdiploid > 50 relapse 



Rank 


Affymetrix 
number 


t-statistics 
value 


Perml% 


Perm 5% 


neighbors 


1 


3772 l_at 


8.7160 


12.7358 


9.9506 


75 


2 


3872 l_at 


S.4162 


10.7256 


8.8438 


59 


3 


40120_at 


7.2736 


9.9837 


8.0383 


73 


4 


41386_i_at 


6.3436 


9.0552 


7.5579 


88 


5 


3S677_at 


6.2698 


8.8633 


7.2466 


88 


6 


37620_at 


6.2174 


S.4154 


6.9604 


82 


7 


34703_f_at 


6.0770 


8.0982 


6.8835 


83 


8 


38355_at 


5.5120 


7.8657 


6.7434 


92 


9 


41214_at 


5.4262 


7.6583 


6.6094 


90 


10 


34530_at 


5.4013 


7.5991 


6.5109 


87 


11 


603_at 


5.3142 


7.5903 


6.4409 


87 


12 


32697_at 


5.1785 


7.5146 


6.3265 


90 


13 


41 129_at 


5.1450 


7.3939 


6.2121 


88 


14 


33333_at 


5.1061 


7.2601 


6.1389 


87 


15 


37078_at 


5.0738 


7.1484 


6.030S 


S6 


16 


3S148_at 


4.9256 


6.9688 


5.9230 


93 


17 


39150_at 


4.9061 


6.9273 


5.9015 


93 


18 


33869_at 


4.8256 


6.8900 


5.8367 


93 


19 


41447_at . 


4.7919 


6.8135 


5.7621 


93 


20 


39369 at 


4.7790 


6.7731 


5.7391 


92 



Individually, the discriminating genes for relapse in T-ALL are significant at either 
the 1% or 5% level, while those for hyperdiploid >50 fall at approximately the 7% 
level. 

Table 51, Results of relapse prediction on indicated subgroups 

Accurac P value by 





Relapse 


CCR 


# genes 


metric 


y 


permutation test 


T-ALL 


8 


26 


7 


t-stats 


97 


0.034 


H>50 


5 


43 


13 


t-stats 


100 


0.018 


TEL-AML1 


3 


56 


7 


CFS 


100 


0.145 


MLL 


5 


7 


4 


t-stats 


100 


0.104 


Others 


4 


56 


20 


t-stats 


98.3 


0.079 



As the number of relapse samples were small, in addition to the usual cross 
validation experiments, 1000 permutation experiments were also performed for each 
subtype-specific relapse study. In each permutation experiment, the samples were re- 
partitioned in a manner that preserved class size by randomly swapping the class 
labels ("relapse" or "continuous complete remission"). The same metric was 
employed to pick the same number of genes as in the original partitioning of the 
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samples given by the original class labels. SVM was then used to obtain a prediction 
accuracy by cross validation for this random partition using these freshly selected 
genes. The percentage of these 1000 permutation experiments was taken as a p-value 
that gave an indication on how many random partitions of the original samples could 
5 achieve the same accuracy as the original samples. The results of these permutation 
experiments are summarized in the last column of Table 51 above. These results show 
that the high accuracy obtained on the predictability of relapse in T-lineage ALL, 
Hyperdiploid>50, and others are unlikely to be a random event. The p-values for the 
subtypes of TEL-AA4L1 and MLL are weaker than the other subtypes. However, in the 
10 case of TEL-AML1 the number of relapse samples were exceedingly small (3) and in 
the case of MLL the number of relapse and non-relapse samples were both very small. 

J. Results for secondary AML prediction 

For the secondary AML prediction ,the same subtype-specific approach was 

15 adopted as described earlier in relapse prediction. This time only the TEL-AMLJ 

subtype had sufficient number of samples for a secondary AML prediction model to 
be developed. For this model, the MIT score (Golub et al (1999) Science 286:53 1- 
37, herein incorporated by reference) was used to select genes and SVM to perform 
classification using these genes. The MIT score of a gene is defined as T = - 

20 H2|/(cri + 02), where \i{ is the mean expression of that gene in the i th class and a; is the 
standard deviation of that gene in the i th class. This formula assigns higher value to a 
gene that has larger mean difference between two classes and has smaller variance 
within both classes. The 20 genes with the highest MIT scores in TEL-AML1 patients 
that went into continuous complete remission versus those TEL-AML1 samples that 

25 developed secondary AML are listed in Table 52 below. 100% accuracy for 

secondary AML prediction accuracy was achieved on TEL-AML1 specific subtype 
samples using these 20 genes. A permutation test was also performed in the same 
manner as described earlier in the subtype-specific relapse prediction, and obtained a 
p- value of 0.031 was obtained, demonstrating that the predictability of the 

30 development of secondary AML in TEL-AML1 -specific patients was unlikely to be a 
random event. 
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Affymetrix 
Number 



Table 52. Genes selected by MIT score for secondary AML 



Gene Name 



Gene 
Symbol 



Reference 
Number 



Above/ 
Below 
Mean 



TEL-AMU 
1 34890 at 



ATPase H transporting lysosomal vacuolar ATP6A1 
proton pump alpha polypeptide 70kD 
isoform 1 



L09235 



Above 



2 40925_at hypothetical protein FLJ1 0803 

3 171 9_at mutS E. coli homolog 3 

4 32877 i at EST IMAGE:954213 



FLJ10803 
MSH3 



AA554945 

U61981 

AA524802 



Above 
Above 
Above 



5 32650_at neuronal protein 

6 331 73_g_at hypothetical protein FL J 1 0849 

7 32545 r at RSU-l/RSP-1 



NP25 

FLJ10849 

RSU-1 



8 34889_at ATPase H transporting lysosomal vacuolar ATP6A1 
proton pump alpha polypeptide 70kD 
isoform 1 



Z78388 
T75292 
L12535 
AA056747 



Above 
Above 
Above 
Above 



9 35180_at cDNA DKFZp586F1323 from clone AL050205 Above 

DKFZp586F1323 

10 34274_at KIAA1 116 protein KIAA1116 AB029039 Above 

11 35727_at hypothetical protein FLJ205 17 FLJ20517 AI249721 Above 

12 1627_at tyrosine kinase (GB:Z25437) HG2715- Above 

HT2811 

13 1461_at nuclear factor of kappa light polypeptide NFKBIA M69043 Below 

gene enhancer in B-cells inhibitor alpha 

14 36023_at lacrimal proline rich protein LPRP AI864120 Above 

15 39167_r_at serine or cysteine proteinase inhibitor SERPINH2 D83174 Above 

clade H heat shock protein 47 member 2 

16 39969_at H4 histone family member G H4FG AA255502 Above 

17 38692_at NGFI-A binding protein 1 ERG1 binding NAB 1 AF04545 1 Above 

protein 1 

1 8 1 594_at polymerase RNA II DNA directed POLR2C J05448 Above 

polypeptide C 33kD 

19 33234_at RBP1 -like protein LOC51742 AA8S7480 Above 

20 34739_at hypothetical protein FLJ20275 FLJ20275 W26023 Above 
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Table 53. Permutation test results for secondary AML 

Affymetrix t-statistics Perm 



Rank 


number 


number 


Perm 1% 


Perm 5% 


median 


neighbors 


1 


34890_at 


1.2204 


2.7933 


2.2138 


1.4712 


822 


2 


40925_at 


1.0712 


2.0006 


1.7607 


1.2884 


859 


3 


1719_at 


1.0599 


1.8536 


1.6272 


1,1894 


767 


4 


32877_i_at 


1.0364 


1.7125 


1.5218 


1.1200 


715 


5 


32650_at 


1.0217 


1.6580 


1.4584 


1.0776 


646 


6 


33173_g_at 


1.0126 


1.5868 


1.4132 


1.0416 


595 


7 


32545_r_at 


1.0097 


1.5536 


1.3630 


1.0223 


536 


8 


34889_at 


0.9959 


1.5164 


1.3241 


1.0009 


512 


9 


351S0_at 


0.9854 


1.4838 


1.2938 


0.9777 


477 


10 


34274_at 


0.9420 


1.4759 


1.2721 


0.9600 


550 


11 


35727_at 


0.8493 


1.4482 


1.2507 


0.9415 


809 


12 


1627_at 


0.8471 


1.4207 


1.2398 


0.9254 


782 


13 


1461_at 


0.8312 


1.4012 


1.2260 


0.9114 


801 


14 


36023_at 


0.8177 


1.3551 


1.2012 


0.8995 


813 


15 


39167_r_at 


0.8136 


1.3462 


1.1806 


0.8894 


790 


16 


39969_at 


0.8122 


1.3395 


1.1702 


0.8785 


759 


17 


38692_at 


0.8109 


1.3333 


1.1565 


0.8696 


729 


18 


1594 at 


0.8103 


1.3142 


1.1503 


0.8626 


696 



Table 54: Additional Genes selected by 


T statistics for BCR-ABL risk group 


Gene symbol 


Accession Number j 


TUBA1 


HG2259-HT234S 


TUBA1 


X06956 


CRADD 


U84388 


SLC2A5 


M55531 


PHYH 


AF023462 


ZFPL1 


AF001891 


CD34 


S53911 


KIAA0015 


D13640 


CLECSF2 


X96719 


CD34 


M81945 


GAB1 


U43885 


;E2F5 


U31556 


CLTB 


M20470 


ENG 


X72012 


LOC55884 


AF038187 


TNFRSF1A 


M5S2S6 


TMSNB 


D82345 


SNL 


U03057 
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1KIAA0990 


AB023207 


MAP1A 


W26631 


MYPT2 


AB007972 


IFI30 


J03909 


|ERPROT213-21 


U94836 


jDKFZP586A052" 

|2 


AL050159 


iLOC51109 


AA126515 


r 


W29087 


.TSTA3 


U58766 


TNFRSF1B 


AI813532 


GSN 


X04412 


KIAA0582 


AI761647 


STATI2 


AF037989 




AL049313 


ITGA4 


XI 6983 


FLJ20500 


AA522530 


ISDR1 


AF061741 


|ARHGEF4 


AB029035 


jClSORFl 


AF009426 


1MAPK14 


U19775 


fFHLl 


AF063002 


GATA3 


X58072 


KIAA0676 


D38548 


jKCNNl 


U69883 


1POM121L1 


D87002 


IIFI30 


J03909 


iABLl 


X16416 


jNELL2 


D83018 


iMEST 


D78611 


|S100A4 


W72186 


ID12S2489E 


AJ0016S7 


ATP2B4 


W2S589 


CTGF 


X78947 


IRGS1 


S59049 


CDK9 


X80230 




AI524873 


STIM1 1 


U52426 


VEGFB 


U48801 


PPP2R2A 1 


M64929 


CASP2 


U13022 


SPS 


U34044 


HRK 


D83699 


KIAA0870 


AB020677 


ABL 1 


U07563 


PKIA 1 


S76965 


FLJ12474 


AA306076 
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;CD97 


X94630 


HCK 


M16591 " 


FYN 


M14333 


KIR2DL3 


AC006293 


DMPK 


L0S835 


N33 


U42360 


FLJ13949 


AL041S79 


PRKCZ 


Z15108 


IL17R 


U58917 


FMR2 


U48436 


INSR 


M10051 


Iahnak 


M80S99 


KIAA0878 


AB020685 


CD86 


U04343 




U82303 


KIAA1043 


AL033538 


N33 


U42349 


SYN47 


Y17829 


ITPR1 


D26070 


|SFRS9 


AL021546 


EPOR 


M60459 


GAC1 


AF030435 


CAMK4 


D30742 


KIAA0084 


D42043 


LAT 


AJ223280 


XBP1 


Z93930 


FLT3LG 


U03858 


TESK1 


D50863 




AF070633 


KIAA0681 


U89358 


FUT8 


IY17979 



T Table 55: Additional Genes selected 
by statistics for E2A-PBX1 Risk Group 


Gene symbol 


Accession Number 


PBX1 


M86546 




AL049381 


FAT 


X87241 


SBLK 


S76617 


IRF4 


U52682 


GS3955 


DS7119 


KIAA0802 


AB018345 


SCHIP-1 


AF070614 


SNL 


U03057 


KIAA0655 


AB014555 


GS3955 


D87119 
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|IGFBP7 


L19182 


ICDKNIA 


U03106 


!CSF2RB 


H04668 


STATI2 


AF037989 


KIAA1029 


AB028952 


KIAA0247 


DS7434 




AL049397 


— „. , — ^ — 

NP 


X00737 


TM4SF2 


LI 0373 


ALOX5 


J03600 


XRMP 


U10485 


PTPN2 


AI8288S0 


ALOX5AP 


AI806222 


AEBP1 
|TGFBR2 


IAF053944 

D506S3 


jODCl 


M33764 


|NED2 


D86425 


ODC1 


! X 16277 


jCBXl 


U35451 


|CSF3R 


M59820 


SKIAA0172 


D79994 


IL1B jM15330 


K1AA0922 |AB023139 


LOC51097 :AA005018 


TUBA1 


X06956 


ITGA6 


S66213 


NFKBIL1 


Y14768 


ADPRT 


J03473 


ADPRT 


J03473 


CSF3R 


M59818 


EFNB1 


U09303 


CD9 |M38690 


CDKN2D 


U40343 


KIAA0442 


AB007902 1 


PRKCZ 


Z15108 




AF055029 


RECK 


D50406 


GOLGA3 


D63997 


iZAP70 


LOS 148 


flu 


M9S833 


LASP1 


X82456 


1AJ001381 


TBXA2R |D38081 


BHLHB2 |AB004066 


AD ARB 1 


U76421 


PTPN6 


X62055 ! 
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K:58398 


TEVCPl i 


D11139 


KIAA0554 I 


AB"6ill26 


SRP14 __. 


AI525652 


ATP9A [ 


AB014511 


HELOl 


AL034374 


GNAQ i 


U43083 


POU4F1 


X64624 


MERTK 1 


U08023 


KJAA0625 |AB014525 


PCLO , 


AB011131 


IL7R 


AF043129 


ITGA6 


X53586 j 


TUBA1 


HG2259-HT2348 ! 


PIR121 (L47738 


MAGED'l |W26633 


CD48 i 


M37766 


TLR1 


AL050262 


NPR1 


X15357 


GLUL 


X59834 


DAPK1 


X76104 


i 


X58398 


ARHGEF4 


AB029035 _) 


NKEFB 


L19185 




AL049435 


ITM2A 


AL021786 _j 


RAG2 


M94633 




L24521 


SCGF 


AF020044 


PRKACB 


M34181 


KCNN4 


AF022797 


KCNN1 


IU69883 


MAPKAPK2 


IU12779 


PIN 


AI540958 


TOP2B 


X68060 


GATA2 


M68891 


IL1B 


X04500 


PDE3B 


U38178 


DGKD 


D73409 


KIAA0993 


AB023210 


AD AMI 0 


AF009615 


IGLL1 


M27749 


PDLJMl 


U90878 


PRKAR1A 


M33336 


CD34 


S53911 


GLA 


U78027 
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IBAZ1B |AF072810 _J 


lEFNAl |M57730 


jFADS3 


AG004770 


!FLT3 1 


U026S7 


LOC57228 


AF091087 


BCL6 


U00115 


BMP2 


M22489 


CD22 


X59350 


KIAA0429 


AB007889 


DKFZP434C171 


AL080169 


ICTBP2 


AF016507 




M11S10 


ISIAT9 


AB018356 

i 


"CYBB 


X04011 I 


AKR1B1 


X15414 l 


NFKBIL1 


Y1476S 


UBE2V1 


U49278 


|DOC-lR _ 


AF089814 


BUB3 


AF047473 


IL7R 


M29696 


ACK1 


L13738 


ENIGMA 


L35240 


KIAA1071 


AB028994 


IGL 


AI932613 


MN1 


X82209 


KIAA0823 


AB020630 


NFKB1 


M5S603 


CD24 


IL33930 


YWHAQ 


[X56468 


(VDAC1 


JL06132 1 


.P85SPR 


|D63476 


SYNGR1 


AL022326 


NDR 


Z35102 


JMJ 


AL021938 


PRSC1 


D55696 


MRC1 


M93221 




AI184710 


CPJP1 


AI017574 


IOAA0056 


D29954 




AF039397 


SU79265 


SLAM 


U33017 


LYL1 


AC005546 


KIAA0620 


AB014520 


VDAC1P 


AJ002428 


SRP9 


AF070649 
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jPRDXl 


X67951 I 


SLC9A3R1 


AF015926 


CD72 


M54992 


ECM1 


U681S6 


PPP2R5A 


L42373 


HDGF 


D 16431 


MERTK 


U0S023 




L02326 


CD34 


M81945 


IL17R 


U58917 


ARL7 


AB016S11 


P4HA2 


U90441 


BZRP 


M36035 


F13A1 


M14539 


|KRAS2 


M54968 1 


BS69 


XS6098 


IORP150 


U65785 




D28915 


LEF1 


AL049409 


SH2D1A 


AL023657 


LY6E 


U6671 1 


FACVL1 


D88308 


EPB42 


M60298 




AL049471 


BMI1 


LI 3689 


KCNJ13 


N36926 


N33 


U42349 


VIL2 


X51521 ~1 


CCNG2 


U47414 


|ClSORFl 


AF009425 


NUMA1 


Z11584 


IDBN1 


U00802 


FLT3 


IU02687 


KIAA0S54 


AB020661 


MGC4175 


AI656421 


KIAA1012 


AB023229 


SCIRBP 


D78134 


|ST5 


U15131 


jKIAAOOOl 


D13626 


ICCR1 


D10925 


CD19 


M28i70 


SNRPE 


AA733050 


CR2 


M26004 


HEXA 


Wl6424 


IFIT4 


AF026939 




W26667 
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EPOR 


M60459 


TMSNB 


D82345 


GCLM 


L35546 


H41 


H15872 


TUBB2 


HG1980-HT2023 




TNFAIP2 


M92357 


GAB1 


U43885 


PTPRK 


L778S6 


BCL7A 


X89984 



Table 56: Additional Genes selected by 

T statistics for Hyperdiploid >50 
Risk Group 



Gene symbol 


Accession Number 


SH3BP5 


AB005047 


FT 


uUi-oo / 


MX1 


M33882 


NPY 


AI198311 


SOD1 


X02317 


PTPRK 


L77886 


IL1B 


X04500 


:CD9 


M3S690 


FLT3 


U02687 


PGK1 


V00572 


EFNB1 


U09303 


FOS 


K00650 


IL1B 


M15330 


MRC1 


M93221 


IHMG14 


J02621 


SNRP70 


X06815 


PDIJM1 


U90878 


ALOX5 


J03600 


RAG2 


M94633 


CALM1 


U12022 


KIAA1013 


AB023230 


NDUFA1 


N47307 


FOS 


V01512 


DXS1357E 


X81109 


ICSBP1 


M91196 


ETS2 


J04102 


PCDH9 


AI524125 


LILRA2 


AF025531 



-120- 



WO 03/083140 



PCT7US03/08486 



[PSAP 


J03077 


h SCHIP-l 

\_ 


AF070614 


!CCND2 


D13639 


KCNN1 


U69S83 1 


A T TP 


J YD W IOJmO 


IGFBP4 


U20982 


A/TO 






VI 55004 


LOC51632 


AI557497 


T TTJTJTm 
\JDEjZ\J2. 


A 1^0194^6 


OT A TTO 

i 


AT70^7QRQ 
atuj /yoy 


ATRX 


U72936 


APT6M8-9 ' 


AL049929 


PTPRE 


X54134 


G1LZ 


AI635895 


PECAM1 


AA100961 


ARHGEF4 


AB029035 


ECM1 


U6S186 




Table 57: Additional Genes selected by 
T statistics for the MIX Risk Group 


Gene symbol 


Accession Number 


EPOR 


M60459 


CD44 


L05424 


PRKCH 


M55284 


MADH1 


U59423 


KLF1 


U65404 


MME 


J03779 


PTPRK 


L77886 


IL1B 


X04500 


YES1 


M15990 


ARPC2 


U50523 


IGFBP4 


M62403 


ITPR3 


U01062 




Ml 3929 


EFNB1 


U09303 


FHIT 


U46922 


NME2 


X58965 


CCND2 


X68452 


MPB1 


M55914 
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CDH2 


M34064 


IGFBP7 


LI 91 82 


ALOX5 


J03600 


PTGDR 


U31099 


PLXNC1 


AF030339 


EIF3S2 


U39067 


BLVRA 


X93086 


HSPC022 


W68830 




S67247 


MYLK 


U48959 


SLC6A11 


S75989 ' 




X67098 


SERPINB1 


M93056 


ILGALS1 


AI535946 


HRK 


D83699 




AL049313 


HBS1L 


AB028961 


KIAA0437 


AB022660 


GDI2 


Y13286 


ITGA4 


X16983 


EEF1B2 


X60489 


MD-1 


AB020499 


POU4F1 


X64624 


TST 


X59434 j 


PTPRF 


Y00S15 


ARHGEF4 


AB029035 


SCHIP-1 


AF070614 ~1 


ASMTL 


AA669799 


IDDR1 


L20817 1 


N33 


U42360 1 


CR2 


M26004 


AHNAK 


M80899 


SCGF 


AF020044 


EPB49 


U28389 


PSPHL 


AJ001612 


MADH1 


U59912 


ITPR3 


U01062 


DPEP1 


J05257 


AKAP1~2 


U81607 


DBI 


AI557240 


KIAA0736 


AB018279 


MAL 


X76220 


S100A4 


W72186 


MDK 


X55110 


CRK 


D10656 
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CAPG 


M94345 


KCNH2 


U04270 


KIAA1069 


AB028992 




at osnriQi 


KIAA0298 


AB002296 


DGKD 


D73409 


DEPP 


AB022718 




AL049957 


CD8B1 


XI 3444 


EFNB1 


U09303 




"AJ391564 


LDOC1 


AB019527 


*EFNA1 


M57730 


CD44 


L05424 


PTPRC 


Y00062 


PTPRC 


Y0063S 


PTPRC 


Y00638 


TFPI 


M59499 


TSPAN-5 


AF065389 


BCL11A 


W27619 




AJ001381 


KIAA1011 


AL080133 


iFYB 


U93049 1 


DKFZp~761F2014 


AA149431 


FGFR1 


X66945 


M63589 


IPTPN6 


X62055 



Table 58: Additional Genes selected by 
T statistics for the Novel Risk Group 


Gene symbol 


Accession Number 


CHST2 


AB014679 


CLTC 


D21260 


TUBA1 


~'~ X06956 


GNG1 1 


U31384 


PCDH9 


AI524125 


MDS019 


AA442560 


RAG2 


M94633 


ITGA6 


X53586 


UBE2E3 


AB017644 


CD34 


S53911 


|CD34 


MS 1945 


IFGFR1 ! M34641 
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ECMl 


U681S6 


MADH1 


U59423 


FUT7 


AB012668 


>PROMLl 


AF027208 


CSNK2A1 


M55265 


FLNB 


AF042166 


MADH1 


U59912 


,LIG4 


XTO O A A 1 

X83441 


'ZNF151 


Y09723 


i ; -- 

ICSF3R 


M59818 




AL080205 


STAU2 


AL079286 j 


jAEBPl 


AF053944 


|KIAA0320 


AB002318 


1K1AA0746 


AB018289 


;PTPRM 


X5S2S8 


jlGFBP4 


M62403 


IZNF266 


AA868898 


iPDLiMl ' 


U90878 


MTMR3 


AB002369 


ITIMP1 


D11139 


TTC2 


W28595 


TM4SF2 


L10373 


PSA 


AA978353 


HTR4 


Y12505 


MMS19L 


AF007151 




AI391564 


TJP2 


i L27476 


BMP2 


M22489 


ARL7 


AB016811 


TLR1 


AL050262 


SMC2L1 


AF092563 


TGFBR2 


j D506S3 


TGFBR2 


D506S3 


SPARC 


1 J03040 


GPRK5 


L15388 


CDH2 


M34064 


KIAA0877 


AB020684 


ABLIM 


D31883 


RNF3 


W25793 


CCBP2 


U94888 


CHN2 


U07223 


ITGA4 


vi j:noo 

X16yb3 


IQGAP2 


T U51903 


FLJ22531 


W80358 


.PIK3CD 


j U86453 
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|FXYD2 | 


H94S81 


r~ 


W30677 


AMPD3 


U29926 




D78577 


KIAA0125 


D50915 


FADS3 


AC004770 


DKFZP434C171 


AL080169 


EST00098 , 


AI885170 


BMP2 


M22489 


LILRB4 


AF072099 


KIAA0429 


AB007889 


DKFZP586G0522 


AL050289 j 




U92818 


ATIC 


D82348 


MONDOA 1 


AB020674 


CNK1 


AF100153 


NGFR 


M14764 


KIAA0540 


AB011112 


IMYO10 


AB018342 


PIASX-BETA 


AF077954 


ACVRi 


Z22534 


ARHGEF10 


AB002292 


PON2 


AF001601 


TST 


X59434 


SPTBN1 


M96803 


ERCC2 


AA079018 


PRSC1 


D55696 


DKFZP434D174 


AL080150 




All 847 10 


CD8B1 


| XI 3444 


! U79265 


DKFZp761F2014 


AA149431 ! 


MEF2A 


U49020 


IJAG2 


AF029778 


1ZNF143 


AF071771 


CASP1 


U13697 


HAP1 


AF040723 


FABGL 


D82061 


ALDH1 


K03000 


RAD9 


U53174 




AL1 09722 


CDC27 


AA1 66687 


B4GALT1 


D29805 
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PTPRM 


7 ' • 

! X5828S 


AHR 


L19872 


N33 


U42349 


IL12RB2 


U64198 


MTR 


U73338 


KIAA0697 


AB0i4597 


CSNK2B 


M30448 




U15590 




W2S612 


HSU79253 


AF052186 


RBBP1 


S57153 


S100A11 


D38583 


;TCF12 


M80627 


! 


AI971169 


EEF1E1 


N32257 


SAP 18 


AW021542 


Ipvrli 


AF060231 




M13929 1 


MKP-L I 


AF038844 




W26667 


CD79B 


M89957 


KIAA0437 


AB022660 




AF070633 


GCLM 


L35546 


EDG6 


AJ000479 


[mal 


X76220 




[Table 59: Additional Genes selected by 


T statistics for the T-ALL Risk Group 


Gene symbol 


Accession Number 


SLP65 


AF068180 


CD3D 


AA919102 


SH2D1A 


AL023657 


CD79B 


M89957 


ICD3E 


M23323 


CTGF 


X78947 


'PFTKl 


AB020641 


TRB 


X00437 


MCD24 


L33930 


ICD22 


X52785 


iTOP2B 


X68060 


CD22 


X59350 


TCL1A 


X82240 


[brag 


AB011170 


CD79A 


U05259 


SCHIP-1 


AF070614 
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jMAL 


X76220 


iHLA-DQBl 


M16276 


PDE4B 


L20971 


jHLA-DQBl 


M60028 J 


CD 19 


M28170 


IKIAA0959 


AB023176 


|LILRA2 


AF025531 


PTPN18 


X79568 


MEF2C I 


L0SS95 


PTP4A2 


U14603 


NPY 


AI198311 "1 


1GAB1 


— * — "■ ! 

U43885 | 


!lck 


U23852 J 


|TCF7 


X59871 I 


(TERF2 


X93512 


ITM2A 1 


AL021786 


MEF2C 


S57212 


SLC9A3R1 


AF015926 

•- 


ENG 


X72012 


DEPP 


AB022718 


jlLlB 


X04500 


IL1B j 


M15330 


ECM1 


U68186 


HLA-DMA 


X62744 


CRMP1 


D78012 


WFS1 


AF084481 


PRKCQ 


L01087 


GNG7 


AB010414 




X58398 


CDKN1A 


U03106 


CD9 


M38690 


PTK2 


L13616 


TRB 


M12886 


JFI35 


L78833 


NUCB2 


X76732 


KIAA0942 


AB023159 


VATI 


U18009 


ARL7 


AB016811 


IUSP20 


AB023220 


PLCG2 


X14034 


PRDX1 


X67951 


POU2AF1 


Z49194 


CMAH 


D86324 


ALOX5 


J03600 


PTPN7 


M64322 


MEF2C 


S57212 



-127- 



BNSDOCID: <WO 030831 40A2_I_> 



WO 03/083140 



PCT/US03/08486 



:KIAA066S 


AL021707 


LOC54103 


AL079277 


EFNB1 


U09303 


HELOl 


AL034374 


ADF 


S65738 


KIAA0906 


AB020713 


IGFBP4 


U20982 


LDHB 


X13794 


CTONA1 


U03100 


EN02 


X51956 


LAT 


AJ223280 


PTPN7 


D11327 


■ 


Ml 6942 


[CSRP2 


U57646 


GLA 


U78027 


ADA 


X02994 


r RGS10 


AF045229 


KIAA0870 


AB020677 


CD3Z 


J04132 


STATI2 


AF037989 


|GSN 


X04412 ! 


TNSR 


X02160 


HLA-DNA 


M31525 


CD72 


M54992 


'EPHB6 


D83492 


MYLK 


U48959 


HLA-DQA1 


AA868382 


LCK 


M36881 


FHL1 


AF063002 


iCRJMl 


AI651806 


IAQP3 


N74607 


HLA-DQB1 


M81141 


GNG11 


U31384 


LARGE 


AJ0075S3 


FOXOIA 


AF032885 


NPR1 


X15357 


GAB1 


U43885 


PTPRE 


X54134 


PDLIM1 


U90878 


NCF4 


AL008637 


ARHGEF4 


AB029035 


PTP4A2 


U14603 


CTNNA1 


AF102803 


SEPW1 


U67171 


CHI3L2 


U58515 


ILILRA2 


U82277 
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CD79A 


— 

U05259 


TCL1B 


AB018563 


TCF4 


M74719 


TACTILE 


M88282 




AB002438 


TXN 


AI653621 


ADE2H1 


X53793 

" AL049449 




GLUL 


X59834 


ZFHX1B 


AB011141 


P4HB 


M22S06 


IFITM1 


J04164 i 


.KIAA0182 


D80004 


SH2D1A 


AF100539 


GNA11 


M69013 ! 


,NCF4 


AL00S637 J 


iSLC2A5 ] 


M55531 


KIAA0993 


AB023210 


HLA-DPB1 


MS3664 


jHLXl 


M60721 


[CTNNA1 


D14705 


FADS3 


AC004770 


GATA3 


X58072 


GDI2 


Y13286 


TM4SF2 


LI 0373 j 


GNA15 


M63904 1 


BTG2 


U72649 


jPvAGl 


M29474 


MDK 


X55110 




X00457 


AKR1C3 


D17793 I 


SLA 


D89077 


LDHA 


X02152 




AL049279 


PTPRC 


Y00638 | 


BMP2 


M22489 


ERG 


M17254 


ICSBP1 


M9U96 


CCT2 


AF026166 


AKAP2 


AB023137 




X58398 


KIAA0128 


D50918 


IGHM 


X58529 


|NOTCH3 


U97669 


JUP 


M23410 


DKFZP58601624 

\ 


AL039458 
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[MYO10 _ . _T__ AB018342 

' CTONAi_ _ ^_ ' _ L23S05_ 

.NOS2A U31511 _ 

D00749 

" " """ L29376 



jlCB-1 AF044896 



GNAI1 


AL049933 


SlOOAfl 


D38583 


MAPKAPK3 


U09578 


fADA 


M13792 


iS100A13 ' 


AI541308 


|VDAC3 


AF038962 ! 


! 


AL049265 


TRIM . 


AJ224878 


CTBP2 ! 


AF016507 


Fi3Al 


M14539 H 


ZNF43 


HG620-HT620 


DKFZp761F2014 


AA149431 




KIAA0442 


AB007902 


CTNNA1 


U03100 


CD2 


M16336 


BMP2 


M22489 


HSPC022 


W68830 


ICAM3 


X69819 


NCF4 


X77094 


GS3955 


D87li9 _j 


"ctsc 


X87212 


GH1 


V00520 


ARPC2 


U50523 


HLA-DRB1 


M32578 1 


GAS1 


L13698 _j 


LAMB 2 


M55210 


EPHB4 


U07695 


COX8 


AI525665 


KIAA0618 


N29665 


KIAA0870 


AI808958 


PIK3CG 


X83368 1 


jlGHD 


| K02882 


'IRF4 


U526S2 


HSPCB 


M16660 


CAPN3 


X85030 


CD6 


X60992 


WSX-1 


AI2638S5 


FXYD2 


H94881 


PTK2 


HG3075-HT3236 
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FUCAl 


M29877 


j'FADS2 


AL050118 


iKARS 


D32053 


IDSCR1 


U85267 


SOX4 


X70683 


TRD 


X73617 


MHC2TA 


U18259 




AL049435 


MDK 


M94250 


'CALM1 


U12022 


PCLO 


AB011131 




AI391564 


FHIT 


U46922 


MONDOA 


AB020674 


|TRG 


M30894 


ISPIB 


X66079 


IFLJ10097 


AL035494 


;TAGLN2 


D21261 


LGALS9 


Z49107 



Table 60: Additional Genes selected by 
T statistics for the TEL-AML1 Risk 
Group 



Gene symbol 



Accession Number 



ARHGEF4 



AB029035 



TNFRSF7 



M63928 



PCLO 



AB011131 



TCFL5 



KCNN1 
NME2 



AB012124 



U69883 



X58965 



PTPRK 



L77S86 



AL049313 



TERF2 



X93512 



GNG11 



U31384 



RAG1 



M29474 



AL080190 



MADH1 



U59423 



MADH1 



HG3523-HT4899 



P114-RHO-GEF 



U59912 



AB011093 



L29254 



MDK 



M94250 



TERF2 



AF002999 



CRMP1 



D78012 
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HLA-DOB 


X03066 


NPKBIL1 


Y14768 




- AA216639 




AL0S0059 


CBFA2T3 


AB010419 


MDK 


X55110 


PIK3C3 


Z46973 


ALOX5 


J03600 


PTP4A3 


AF041434 


IPOU2AF1 


Z49194 


POU4F1 


L20433 


PRKCB1 


X07109 


GCAT 


Z97630 


•PHYH 


AF023462 


SPTA1 


M61877 


jIDIl 


X17025 j 


;FYB 


U93049 


ITPRl 


D26070 


GTT1 


AL041780 


[FADS3 


AC004770 


CCT2 


AF026166 


ISG20 


U88964 j 


SCHIP-1 


AF070614 


DR6 


AF068868 


MYO10 


AB01S342 


ZNF91 


LI 1672 j 


T-STAR 


AF051321 


FUCA1 


M29877 


HLA-DQB1 


M60028 


| AB002438 


CTGF 


X78947 


FKBP1A 


M34539 




AB91564 


RAB1 


AL050268 


iINSR 


X02160 


JOAA0540 


AB011112 


TM4SF2 


L10373 


CASP1 


MS7507 


MT1L 


AA224832 


MME 


J03779 




AI743299 


KARS 


D32053 


SCHN2 


U07223 


(IQGAP2 


U51903 


KIAA0906 


AB020713 


STATI2 


AF037989 
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HLA-DMA 


X62744 


r CD36Ll ' 


Z22555 


PRKCB1 


X0631S 


GS3955 


D87119 


ACTN1 


XI 5804 


FLJ20154 


AF070644 


KIAA0769 


AB018312 


SDC1 


Z48199 


SOX4 _( 


X70683 


NRTN 


U78110 


CTNND1 


AB0023S2 


,FHIT 


U46922 


FARP1 


AI701049 


FOXOIA 


AF032885 


NPY 


AI19S311 


jVDUPl 


S73591 


(H2AFO 


AI885852 


(TACTILE 


M88282 


SNL 


U03057 


IjUP 


M23410 


NR3C2 


M16S01 


PRPS2 


Y00971 


LILRA2 


AF025531 


RNAHP 


H68340 


1DPYSL2 


U97105 


ITGB2 


Ml 5395 


PCDH9 


AI524125 


LAIR1 


AF013249 


CD79A 


U05259 


NFKBIL1 


Y 14768 


PCCA 


1 S79219 


HLA-DMB 


U15085 


(SMARCA4 


D26156 



EXAMPLE 2 

5 To identify additional additional genes whose expression levels could be used 

as a diagnostic tool to identify ALL subgroups, leukemic blasts from 132 diagnostic 
samples were analyzed using higher density oligonucleotide arrays that allow the 
interrogation of a majority of the identified genes in the human genome. 

A subset of the 327 diagnostic pediatric ALL samples described above were 
10 reanalyzed using these higher density microarrays. Case selection was based on 
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providing a representation of the known prognostic ALL subtypes including 
t(9;22)[BCR-ABL], t(l;l9)[E2A-PBXl], t(l2;2l)[TEL-AMLl], rearrangement in the 
MIX gene on chromosome 1 lq23 ? and hyperdiploid karyotype with >50 
chromosomes. Since the goal was to define expression profiles that could be used to 
5 accurately diagnose the known prognostic subtypes of ALL, we chose to over 

represent these subtypes compared to what is normally seen in a random population of 
childhood leukemia patients. A total of 132 samples met these criteria and had 
sufficient material remaining to be used for this analysis. The list of samples and 
subtype distribution of the cases used in this study are shown in Tables 61 and 52, 
10 respectively. 



Table 61. Diagnostic ALL samples used for class prediction (n=132) 



BCR-ABL-#1 


Hyperdip>50-C18 


Pseudodip-#6 


BCR-ABL-#2 


Hyperdip>50-C21 


Pseudodip-C2-N 


BCR-ABL-#3 


Hyperdip>50-C22 


Pseudodip-C3 


BCR-ABL-#4 


Hyperdip>50-C23 


Pseudodip-C5 


BCR-ABL-#5 


Hyperdip>50-C27-N 


Pseudodip-C6 


BCR-ABL-#6 


Hyperdip>50-C32 


Pseudodip-C7 


BCR-ABL-#7 


Hyperdip>50-R4 


Pseudodip-C9 


BCR-ABL-#S 


Hyperdip47-50-C14-N 


Pseudodip-C14 


BCR-ABL-#9 


Hyperdip47-50-C3-N 


Pseudodip-C16-N 


BCR-ABL-Hyperdip-#1 0 


Hypodip-#2 


Pseudodip-Rl-N 


BCR-ABL-C1 


Hypodip-2M#l 


T-ALL-#5 


BCR-ABL-R1 


Hypodip-C2 


T-ALL-#6 


BCR-ABL-R2 


Hypodip-C5 


T-ALL-#7 


BCR-ABL-R3 


MLL-#1 


T-ALL-#8 


B CR- ABL-Hyperdip-R5 


MLL-#2 


T-ALL-#10 


E2A-PBXl-#5 


MLL-#3 


T-ALL-C2 


E2A-PBXl-#6 


MLL-#4 


T-ALL-C6 


E2A-PBXl-#9 


MLL-#5 


T-ALL-C7 


E2A-PBX1-#10 


MLL-#6 


T-ALL-C11 


E2A-PBX1-#12 


MLL-#7 


T-ALL-C15 
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E2A-PBX1-#13 




T-ALL-C19 


T"JO A TYO*V1 OA A 44- 1 

E2 A-PBX 1 -2 M# 1 


A/fT T 9A/f#1 


T-ALL-C21 


E2A-PBX1-C2 


A/fT T 9A/f£9 


T-ALLrR5 


E2A-PBX1-C3 


A/fT T -P1 


T-ALL-R6 


E2A-PBX1-C4 


A/fT T PO 


TEL-AMLl-#6 


E2A-PBX1-C5 


A/fT t r 1 ^ 


TEL-AMLl-#9 


E2 A-PBX l -Co 


A/fT T Pzl 


TEL-AML1-#10 


E2A-PBX1-C7 


A/fT t 


TEL-AML1-#14 


E2A-PBX1-C9 


A/fT t 


TEL- AML 1 -2M# 1 


tt— i /-\ a T>DV1 pi A 

E2 A-PBX 1 - CI U 


A/fT T -TJ 1 


TEL- AML1 -2M#2 


E2A-PBX1-C11 


A/fT T T29 


TEL-AML1-C4 


E2A-PBX1-C12 


A/fT T -R ^ 


TEL-AML1-C5 


E2 A-rb A 1 -K 1 


MT T -R4 


TEL-AML1-C6 


tj, ^^w^i-*-*-^^ A 44-Q 

riyp eraip>3 u-tfo 


Nonnal-Cl-N 


TEL- AML1 -C26 


Myperaip>~>u-?f 1 z 


Tsrormal-C^-N 


TEL-AML1 -C28 


rlyp erarp-^o u-tf i i 4- 


Normal-C3-N 


TEL-AML1 -C30 


riyperaip->D u-v^ i 


Normal-C4-N 


TEL-AML1-C31 


xiyperaip-^3 u-^^f 


Nornial-C7-N 


TEL-AML1 -C32 


riyperaip^D u-v^o 


Normal-C8 


TEL-AML1-C33 


±lyperaip>!) u-Co 


IN \JLLlLcLL Is 


TEL-AML1-C34 


rlypei aip>3 U-v^ 1 1 


TsTormal-Cl 1 -N 


TEL-AML1-C37 


xiyperaip^ou-^ 1 o 


TSTormal-Rl 


TEL-AML1 -C3 8 


Hyperaip>5 0-C 1 5 


IN OlUldl-iV^. IN 


TEL- AML 1 -C40 


Hyperdip>50-C16 


Pseudodip-#5 


TEL- AML 1 -R3 



* Subtype Name-C# Dx Sample of patient in CCR 

Subtype Name-R# Dx Sample of patient who developed a hematologic relapse 
Subtype Name-# Dx Sample used for subgroup classification only 
Subtype Name-2M# Dx Sample of patient who later developed 2nd AML 
Subtype Name-N Dx Sample in novel group 
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Table 62. Subgroup distribution of ALL cases 


Subgroup 


Train Set 


Test Set 


BCR-ABL 


11 


4 


E2A-PBX1 


13 


5 


Hyperdiploid >50 


13 


4 


MLL 


15 


5 


T-ALL 


12 


2 


TEL-AMLl 


15 


5 


Other 


21 


7 


Total 


100 


32 



26,S25 probe sets from combined Affymetrix® brand U133A and B 
5 microarrays (Affymetrix, Inc., Santa Clara, CA) showed variation in expression levels 
across the 132 diagnostic leukemia samples. In an initial analysis of these data, two 
complementary unsupervised clustering algorithms: two-dimensional hierarchical 
clustering and principle component analysis (PC A), were used to assess the major 
sub-groupings of the leukemia cases based solely on gene expression profiles. These 

10 unbiased clustering algorithms demonstrated that the pediatric ALL cases cluster 
primarily into seven major subtypes: T-ALL and 6 subtypes of B-cell lineage ALL 
corresponding to (1) rearrangement in the MLL gene on chromosome 1 lq23, (2) 
t(l;19)[E2A-PBXl], (3) hyperdiploid >50 chromosomes, (4) t(9;22)[BCR-ABL], (5) 
the novel subgroup, and (6) t(12;21)[TEL-AMLl]. In addition, a heterogeneous group 

15 of B-lineage cases were identified that lacked any of the defined genetic lesions and 
failed to cluster into the novel subgroup. Several of these leukemia subtypes formed 
distinct branches when all differentially expressed genes were used in the two- 
dimensional hierarchical clustering algorithm (T-ALL, Hyperdiploid >50 
chromosomes, and TEL-AMLl), whereas other subtypes clustered in multiple 

20 branches, suggestive of gene expression differences within these subclasses. Using 
PCA, the distinct nature of the B-cell lineage subtypes is better appreciated when the 
T-ALL cases were removed from the analysis. A diagnostic accuracy of 100% was 
achieved for two of the leukemia subtypes (T-ALL and TEL-AMLl), indicating the 
need to use supervised learning algorithms to achieve optimal diagnostic accuracy by 

25 gene expression profiling. 

Statistical methods were used to identify probe sets that were the best 
discriminators of the individual leukemia subtypes. In order to identify the genes that 

-136- 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083140 ._ PCT/US03/08486 

provide the highest accuracy in diagnosing specific prognostic subtypes of leukemia, 
the decision tree format described elsewhere herein was used for the identification of 
leukemia subtypes. Briefly, we first defined whether a case is T- or B-cell in lineage. 
If the case is classified as T-cell, a diagnosis of T-ALL is made. If non-T, we then 
5 determine if the case can be classified into one of the known B-cell lineage risk 

groups, deciding sequentially if it is E2A-PBX1, TEL-AML1, BCR-ABL, rearranged 
MLL gene, and lastly hyperdiploid with >50 chromosomes. Cases not assigned to one 
of these classes are left unassigned. The use of this decision tree format directly 
influences the selection of genes, allowing the selection of discriminating genes for 

10 groups lower down the tree that might also be expressed by subtypes higher in the 
tree. Using a number of different supervised learning algorithms, it was found that a 
higher diagnostic accuracy is obtained using this decision tree format, as compared to 
a parallel format in which each class is identified against all others. 

Discriminating genes were selected using a chi-square metric on the 100 cases 

15 in the training set. Genes were selected that discriminated between a class and all 

leukemia subtypes below it in the decision tree. The number of discriminating probe 
sets per leukemia subtype at a statistical significance level of p < 0.001 (as determined 
by a permutation test) were: T-ALL, 2063; E2A-PBX1, 1059; TEL-AML1, 805; 
BCR-ABL, 201; MLL chimeric genes, 726; and hyperdiploid with >50 chromosomes, 

20 994. The lists of discriminating genes obtained using the top 100 ranked probe sets for 
the six prognostically important subgroups are contained in Tables 63-68. As multiple 
probe sets for the same gene are present on Affymetrix microarrays, the top 100 
ranked probe sets represent between 75 and 92 distinct genes, depending on the 
leukemia subtype. As shown, distinct groups of either over or under expressed genes 

25 distinguish cases defined by E2A-PBX1 , MLL gene rearrangement, T-ALL, 
hyperdiploid >50 chromosomes, BCR-ABL, and TEL-AML1. 

The following tables contain a list of the top 100 probe sets for each diagnostic 
subtype, ranked by their chi-square value. Each table contains the Affymetrix® U133 
series probe set number, a gene description, gene symbol, chromosomal location, and 

30 primary GenBank reference. Chi-square values were calculated utilizing only the 
samples in the train set in a differential diagnosis decision tree format. The 
calculation of the fold change was done in a parallel format using the total data set 
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and comparing the mean signal value in the class versus the mean signal value in 1 
non-class. 

Table 63, Top 100 chi-square probe sets selected for BCR-ABL 

Bcr 

Chromo- Chi- above/ 





U133 probe 
set 


Gene description 


Gene 
symbol 


somal 
location 


GenBank 
Reference 


square 
value 


below 
mean 


r oia 


1 


241812 at 


EST FLJ39877 


FLJ39877 


2 


AV648669 


47.4 


Above 




2 


201876_at 


Paraoxonase/ 


PON2 


7q21.3 


NM_000305.1 


47.2 


Above 


lo. / 






arylesterase 2 








44.3 


Above 


z.o 


3 


201028_s_at 


Antigen identified 


MIC2 


Xp22.32 


U82 164.1 






by monoclonal 


















antibodies 12E7, 


















F21 and013 














4 


200953_s_at 


Cyclin D2 


CCND2 


12pl3 


NM 001759. 1 


42.3 


Above 


5 


202947_s__at 


Glycophorin C 


GYPC 


2ql4-q21 


NM_002101.2 


42.3 


Above 


3.1 






integral membrane 


















glycoprotein 










Above 


4.3 


6 


223449 at 


Semaphorin 6A 


SEMA6A 


5q23.1 


AF225425.1 


42.3 


7 


201029_s_at 


Antigen identified 


MIC2 


Xp22.32 


NM_002414.1 


41.2 


Above 


2.4 






by monoclonal 


















antibodies 12E7, 


















F21 and 013 










Above 




8 


204429_s_at 


Solute carrier 


SLC2A5 


lp36.2 


BE560461 


41.2 


5 






family 2 


















(facilitated 


















glucose/fructose 


















transporter), 


















member 5 












23. o 


9 


210830 s at 


Paraoxonase 


PON2 


7q21.3 


AF00 1602.1 


41.2 


Above 


10 


215028 at 


Semaphorin 6 A 


SEMA6A 


5 


AB002438.1 


41.2 


Above 


4.5 


11 


220024_s_at 


Periaxin 


PRX 


19ql3.13 


NM_020956.1 


41.2 


Above 


5.2 










-ql3.2 








43.4 




ZU17UO S al 


UYA79 nrntein 


HYA22 


3p21.3 


NM 005808.1 


41.1 


Above 


13 


209365_s__at 


Extracellular 


ECM1 


lq21 


U65932.1 


41.1 


Above 


6 






matrix protein 1 










Above 


10.9 


14 


238689_at 


GPR110G 


GPR110 


6 


BG426455 


41.1 






protein-coupled 


















receptor 110 










Above 


12.4 


Id 


zZziD4_s_at 


DKFZP56 


2q33.1 


AK002064.1 


40.4 






DKFZP564A2416 


4A2416 
















unknown protein 


















with a histone H5 


















signature. 










Above 


1.5 


16 


218084_x_at 


FXYD domain- 


FXYD5 


19ql2- 


NM_0 14 164.2 


38 






containing ion 




ql3.1 














transport regulator 














17 


212242_at 


5 

Tubulin, alpha 1 


TUBA1 


2q36.2 


AL565074 


37 


Above 


3.2 






(testis specific) 








36.3 


Above 


10.8 


18 


201445 at 


Calponin3, acidic 


CNN3 


Ip22-p21 


NM 001839.1 


19 


20277 l_at 


K1AA0233 gene 




16q24.3 


NM_014745.1 


36.3 


Above 


1.9 






product 


KIAA023 
3 












20 


212298_at 


Neuropilin 1 


NRP1 


10pl2 


BE620457 


36.3 


Above 


13.8 
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21 212458_at 

22 22248S_s_at 

23 222762_x_at 

24 20095 l_s_at 

25 204430 s at 



26 205467_at 

27 225660_at 

28 225913 at 



29 236489_at 

30 240173_at 

31 240499_at 

32 201310 s at 



33 215617_at 

34 242579_at 

35 202717 s at 



36 205055_at 



37 217967_s_at 

38 201656_at 

39 207196_s__at 

40 219315_s_at 

41 202123 s at 



42 219938_s_at 

43 228046_at 

44 64064_at 

45 222729__at 



FLJ21897 

Dynactin 4 

LIM domains 

containing 1 

Cyclin D2 

Solute carrier 

family 2 

(facilitated 

glucose/fructose 

transporter), 

member 5 

Caspase 10 

Semaphorin 6A 

FLJ21140 

(Ser/Thr protein 

kinase) 

EST 

EST 

EST 

P311 protein. 

Similar to 

gastrin/cholecysto 

kinin type B 

receptor. 

FLJ11754 

EST 

CDC16cell 
division cycle 16 
homolog 
Integrin, alpha E 
(antigen CD 103, 
human mucosal 
lymphocyte 
antigen 1) 
Chromosome 1 
ORF 24 

Integrin, alpha 6 
Nef-associated 
factor 1 
hypothetical 
protein FLJ23058 
V-abl Abelson 
murine leukemia 
viral oncogene 
homolog 1 
Pro-Ser-Thr 
phosphatase 
interacting protein 
2 

EST;DKFZp434P 

0235 

Immune 

associated 

nucleotide 4 like 1 

F-box and WD-40 

domain protein 7 

(archipelago 

homolog, 

Drosophila) 



FLJ21897 2 AW138902 36.3 Above 2.4 

DCTN4 5q31-q32 BE218028 363 Above 3.6 
LIMD1 3p21.3 AU144259 36.3 Above 2.6 



CCND2 


12pl3 


NM 001759.1 




Above 


19 7 


SLC2A5 


lp36.2 


NM_003039.1 


35.3 


Above 


5.1 


C ASP 10 


2q33-q34 


NM 001230.1 


35.3 


Above 


3.6 


SEMA6A 


5q23.1 


W92748 


35.3 


Above 


3.3 


FLJ21140 


15 


AK025 943.1 


35.3 


Above 


9 Q 




6 


AI282097 


35.3 


Above 


10. / 




4 


AI732969 


35.3 


Above 


10. i 




10 


AA482221 


35.3 


Above 


1.3 


P311 


5q21.3 


NM_004772.1 


ICO 

35.2 


Below 


9 9 


FLJ11754 


2 


AU145711 


35.2 


Above 


14.4 




4 


AA935461 


35.2 


Above 


10.2 


CDC 16 


13q34 


NM_003903.1 


34.4 


Above 


1.1 


ITGAE 


17pl3 


NM__00220S.3 




rseiow 


9 1 


Clorf24 


lq25 


AF288391.1 


34.4 


Above 


3.2 


ITGA6 


2q31.1 


NM 000210.1 


33.9 


Above 


2.8 


NAF1 


5q32- 


NMJ)0605S.l 


32.2 


Above 


1.4 




q33.1 








5.3 


FLJ20898 


16pl3.12 


NM_024600.1 


32.2 


Above 


ABL1 


9q34.1 


JNJV1 UUjIO/.x 






1.8 


PSTPEP2 


18ql2 


NM_024430.1 


31.2 


Above 


5 


DKFZp4 


4 


AA741243 


31.2 


Above 


1.1 


34P0235 










3.3 


IAN4L1 


7q36 


AI435089 


30.9 


Above 


FBXW7 


4q31.23 


BE551877 . 


30.5 


Above 


2.4 
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46 


229975_at 


EST 




4 


AI826437 


30.5 


Above 


9.1 


47 


200864_s_at 


RAB11A 


RAB11A 


15q2L3- 


NMJXH663.1 


29.7 


Above 


1.4 










q22.31 










48 


203089_s_at 


Protease, serine, 
25 


PRSS25 


2pl2 


NMJH3247.1 


29.7 


Above 


1.7 


49 


205376_at 


Inositol 


INPP4B 


4q3Ll 


NM_003866.1 


29.7 


Above 


12.4 






polyphosphates- 


















phosphatase, type 
II 














50 


209229_s_at 


KIAA1115 




19ql3.42 


BC002799.1 


29.7 


Above 


1.3 






protein 


KIAA111 

5 












51 


219871_at 


Hypothetical 


FLJ13197 


4pl4 


NM_024614.1 


29.7 


Above 


14.5 






protein FLJ13197 














52 


222868_s_at 


Interleukin 18 


IL18BP 


llql3 


AI521549 


29.7 


Above 


7.1 






binding protein 














53 


235988_at 


GPR110G 


GPR110 


6pl2.3 


AA746038 


29.7 


Above 


15.8 






protein-coupled 


















receptor 1 10 














54 


239273_s_at 


Matrix 


MMP28 


17qll- 


AI927208 


29.7 


Above 


90.5 






metalloproteinase 
28 




q21.1 










55 


206150_at 


Tumor necrosis 




12pl3 


NM_00 1242.1 


29.5 


Above 


3.2 






factor receptor 


TNFRSF7 
















superfamily, 


















member 7 














56 


212203_x_at 


Interferon induced 


IFITM3 


8ql3.1 


BF338947 


29.5 


Above 


2.3 






transmembrane 
















protein 3 














57 


217110 s at 


Mucin 4 


MUC4 


3q29 


AJ242547.1 


29.5 


Above 


47.5 


58 


223075_s_at 


hypothetical 


FLJ12783 


9q34.13- 


AL136566.1 


29.5 


Above 


3.9 






protein FLJ12783 




q34.3 










59 


229139 at 


EST 




8 


AI202201 


29.5 


Above 


10.8 


60 


229367_s_at 


Hypothetical 


FLJ22690 


7 


AW130536 


29.5 


Above 


3.6 






proteins 


















FLJ22690. 














61 


213093 at 


FLJ30869 


FLJ30869 


Xq28 


AI471375 


29.1 


Above 


2.5 


62 


216033_s_at 


FYN oncogene 


FYN 


6 


S74774.1 


29.1 


Above 


2.7 






related to SRC 














63 


202369_s_at 


TRAM-like 


KIAA005 


6p21.1- 


NM 012288.1 


28.7 


Above 


3.3 






protein 


7 


pl2 










64 


212592_at 


immunoglobulin J 


IGJ 


4q21 


AV733266 


28.7 


Above 


7.9 






polypeptide, linker 


















protein for 


















immunoglobulin 


















alpha and mu 


















polypeptides 














65 


219218_at 


hypothetical 


FLJ23058 


17q25.3 


NM_024696.1 


28.7 


Below 


6.2 






protein FLJ2305S 














66 


24205 l_at 


EST 




Y 


AI695695 


28.7 


Above 


2.2 


67 


200655_s_at 


Calmodulin 1 


CALM1 


14q24- 


NM_006888.1 


28.5 


Above 


1.3 






(phosphorylase 




q31 














kinase, delta) 














68 


202794_at 


Inositol 


INPP1 


2q32 


NM_002 194.2 


2S.4 


Above 


1.6 






polyphosphate- 1 - 
















phosphatase 














69 


218348 s at 


HSPC055 protein 


HSPC055 


16pl3.3 


NM 014153.1 


27.7 


Below 


1.1 


70 


205269_at 


Lymphocyte 


LCP2 


5q33.1- 


AI123251 


26.9 


Above 


1.6 






cytosolic protein 2 




qter 
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71 238488_at 

72 202242_at 

73 218764_at 

74 224811_at 

75 225799_at 

76 228297_at 

77 203508_at 

78 20807 l_s_at 

79 20932 l_s_at 

80 226345_at 

81 200863_s_at 

82 205270_s_at 

83 208881_x_at 

84 212862 at 



85 213385_at 

86 218013_x_at 

87 218966_at 

88 200742 s at 



89 203217_s_at 

90 205259 at 



91 220684_at 

92 225244 at 



Ran binding 
protein 1 1 

Transmembrane 4 
superfamily 
member 2 
Hypothetical 
protein MGC5363 
FLJ30652 
Hypothetical 
protein MGC4677 
Calponin 3, acidic 
Tumor necrosis 
factor receptor 
superfamily, 
member IB 
Leukocyte- 
associated Ig-like 
receptor 1 
Adenylate cyclase 
3. 

DKFZp43401317 



RAB11A, member 
RAS oncogene 
family 
Lymphocyte 
cytosolic protein 2 
Isopentenyl- 
diphosphate delta 
isomerase 
CDP- 

diacylglycerol 

synthase 

(phosphatidate 

cytidylyltransferas 

e)2 

Chimerin 2 
Dynactin 4 
Myosin 5C 
Ceroid- 
lipofuscinosis, 
neuronal 2, late 
infantile (Jansky- 
Bielschowsky 
disease). A 
pepstatin- 
insensitive 
lysosomal 
peptidase. 
Sialyltransferase 9 
Nuclear receptor 
subfamily 3, 
group C, member 
2 

T-box21 
IMAGE3451454: 
GRASP protein 





5ql2.2 


BF511602 


26. y 


Above 


0 *7 


LOC5119 












4 

TM4SF2 


Xqll.4 


NM_004615.1 


26.6 


Above 


1.7 




14q22.1- 


NM_024064.1 


26.6 


Above 


1.7 


MGC5363 
FLJ30652 


q22.3 
3 

2ql2.3 


BFl 12093 
BF209337 


zo.o 
26.6 


Above 
Above 


i s 
2.2 


MGC4677 
CNN3 


Ip22-p21 
lp36.3- 


AI807004 
NM_00 1066.1 


26.6 
26 


Above 
Above 


4.7 
2.6 


TNFRSF1 
B 


p36.2 










LAIR1 


19ql3.4 


NM_021708.1 


26 


Above 


2 


ADCY3 


2p24-p22 


AF033861.1 


26 


Above 


2.1 




10 


AW270158 


26 


Below 


1.4 



DKFZp43 
401317 

RAB11A 15q21.3- 
q22.31 

LCP2 5q33.1- 
qter 

IDI1 10pl5.3 



AI215102 

NM_005565.2 
BC005247.1 



25.8 

25.8 
25.S 



Above 

Above 
Below 



CDS2 20pl3 AL568982 25.8 Above 



CHN2- 7 

DCTN4 5q31-q32 

MY05C 15q21 

CLN2 llpl5 



AK026415.1 
NMJH6221.1 
NM_01S728.1 
BG231932 



25.8 
25.8 
25.8 
25 



Above 
Above 
Above 
Above 



SIAT9 2pll.2 
NR3C2 4q31.1 



TBX21 17q21.2 
EV1AGE34 lq42.13 
51454 

-141- 



NMJ)03896.1 
NM 000901.1 



NM_013351.1 
AA019893 



25 
25 



25 
25 



Above 
Above 



Above 
Above 



1.4 

1.6 
1.7 

1.8 



3 
3.6 
1.8 
1.5 



1.8 
1.9 
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93 
94 



96 
97 



239519_at 
203005 at 



95 200665_s_at 



204004_at 
204576 s at 



EST 

Lymphotoxin beta 
receptor (TNFR 
superfamily, 
member 3) 
Secreted protein, 
acidic, cysteine- 
rich (osteonectin) 
PRKC, apoptosis, 
WT1, regulator 
KIAA0643 
protein 



98 214255_at ATPase, Class V, 

type 10C 

99 216985_s_at Syntaxin3A 

100 48106 at FLJ20489 



LTBR 



SPARC 



PAWR 



KIAA064 
3 

ATP10C 



10 

12pl3 



5q31.3- 
q32 

12q21 

16pl2.3 



15qll- 
ql3 

STX3A llql2.3 
FLJ20489 12pll.l 



AA927670 
NM 002342.1 


25 
24.3 


Above 
Above 


18.2 
10 


NM_003 118.1 


24.3 


Above 


9.8 


AI336206 


24.3 


Above 


3 


AA207013 


24.3 


Above 


2 


AB011138.1 


24.3 


Above 


9.9 


AJ002077.1 
H14241 


24.3 
24.3 


Above 
Above 


12 

2.8 



E2A 
above/ 
below 
mean 



Fold 
change 



2 201695_s_at 

3 204674_at 

4 205253_at 

5 212148_at 

6 212151_at 

7 212371_at 

8 219155_at 

9 225483_at 

10 227439 at 



NM_000270.1 88.0 
NM 006152.1 88.0 



Above 
Above 



3.8 
5.8 



Table 64. Top 100 chi-square probe sets selected for E2A-PBX1 

Chromo- Chi- 
XJ133 probe somal GenBank square 
set Gene Description Symbol Location reference value 

1 201579_at FAT tumor FAT 4q34-q35 NM_005245.1 88.0 Above 9.9 

suppressor 
bomolog 1 
(Drosophila) 

nucleoside NP 14ql3.1 

phosphorylase 

lymphoid- LRMP 12pl2.3 

restricted 
membrane protein 

pre-B-cell PBX1 lq23 

leukemia 
transcription 
factor 1 

pre-B-cell PBX1 lq23 

leukemia 
transcription 
factor 1, splice 
variant 

pre-B-cell PBX1 lq23 

leukemia 
transcription 
factor 1, splice 
variant 

DKFZp586C1019 



NM 002585.1 88.0 Above 3549.2 



BF967998 88.0 Above 5283.5 



BF967998 88.0 Above 7472.2 



retinal 

degeneration B 
beta 

hypothetical 
protein 
MGC10485 
E2a-Pbxl- 
associated protein 



DKFZp58 1 
6C1019 

RDGBB 17q24.2 



MGC1048 llq25 
5 

EB-1 12 
-142- 



AL049397.1 88.0 Above 2.5 

NMJH2417.1 88.0 Above 2.7 

AI971602 88.0 Above 7.7 

AW005572 88.0 Above 269.8 



BNSDOCID: <WO 03083 140A2J_> 



WO 03/083140 



PCT/US03/084S6 



11 227949_at 

12 230306 at 



13 231095_at 

14 203372_s_at 

15 20602S_s_at 

16 2061Sl_at 

17 208788 at 



18 209760_at 

19 35974_at 

20 38340_at 

21 208644_at 

22 212789_at 

23 221113_s_at 

24 224022_x_at 

25 231040_at 

26 232289_at 

27 235666_at 

28 203373_at 

29 210785_s_at 

30 224733_at 

31 225235 at 



Q9H4T4 like 
hypothetical 
protein 
MGC10485 
retinal 
degeneration B 
beta 

STAT induced 
STAT inhibitor-2 
c-mer proto- 
oncogene tyrosine 
kinase 
signaling 
lymphocytic 
activation 
molecule 

homolog of yeast 
long chain 
polyunsaturated 
fatty acid 
elongation 
enzyme 2 
KIAA0922 
protein 
lymphoid- 
restricted 
membrane protein 
huntingtin 
interacting protein 
12 

ADP- 

ribosyltransferase 

(NAD+; poly 

(ADP-ribose) 

polymerase) 

KIAA0056 

protein 

wingless-type 

MMTV 

integration site 
family, member 
16 

wingless-type 
MMTV 

integration site 
family, member 
16 

EST 

FLJ14167 
EST 

STAT induced 
STAT inhibitor-2 
basement 
membrane- 
induced gene 
chemokine-like 
factor super 
family 3 
hypothetical 



H17739 
MGC1048 
5 



20ql3.32 
llq25 



AL357503 88.0 Above 59.3 
AA5 14326 88.0 Above 19.2 



RDGBB 17q24.2 AW193811 88.0 Above 25.6 



SOCS2 
MERTK 



12q 
2ql4.1 



AB004903.1 80.6 Below 23.4 
NM 006343.1 80.6 Above 23.7 



SLAM Iq22-q23 NM_003037.1 80.6 Above 6.3 



HELOl 



6p21.1- 
pl2.1 



AL136939.1 80.6 Above 



2.2 



KIAA092 


4q31.23 


AL136932.1 


80.6 


Above 


2.9 


2 

T 1? A/TP 




U 10485 


80.6 


Above 


6.2 


HIP12 


12q24 


AB014555 


80.6 


Above 


3.8 


ADPRT 


Iq41-q42 


M32721.1 


80.2 


Above 


3.0 


KIAA005 


llq25 


AI796581 


80.2 


Above 


3.9 


6 

WNT16 


7q31 


NM_016087.1 


80.2 


Above 


2547.6 


WNT16 


7q31 


AF169963.1 


80.2 


Above 


569.1 




9 


AW5 12988 


80.2 


Above 


16.4 


FLJ14167 


17 


BF237871 


80.2 


Above 


144.1 


FLJ20489 


10 


AA903473 


80.2 


Above 


654.6 


SOCS2 


12q 


NM_003877.1 


74.2 


Below 


24.8 


ICB-1 


lp35.3 


AB035482.1 


74.2 


Below 


4.1 


CKLFSF3 


16q23.1 


AL574900 


74.2 


Below 


41.7 


MGC1485 


5q35.3 


AW007710 


74.2 


Above 


3.6 



-143- 



BNSDOCID: <WO_ 



_O3083140A2_l_> 



WO 03/083140 



PCT/US03/08486 



32 204114_at 

33 211913_s_at 

34 219551_at 

35 223693_s_at 

36 200600_at 

37 213909_at 

38 221669_s_at 

39 235911 at 



protein 9 
MGC14859 

nidogen 2 N1D2 

(osteonidogen) 

c-mer pro to- MERTK 

oncogene tyrosine 

kinase 

uncharacterized BM040 

bone marrow 

protein BM040 

hypothetical FLJ 10324 

protein FLJ10324 

mo e sin MSN 



14q21- NM_007361.1 73.1 Above 15.1 
q22 

2ql4.1 L08961.1 72.8 Above 37.7 



3q21.1 NM_018456.1 72.8 Above 3.0 



7p22 



AL136731.1 72.8 



40 243533_x_at 

41 20261 5_at 

42 204774_at 

43 218283_at 

44 209130_at 

45 228580_at 

46 202796_at 

47 218640_s_at 

48 235099_at 

49 2018S9_at 

50 202106_at 

51 202208_s_at 

52 205173_x_at 



FLJ 12280 
acyl-Coenzyme A 
dehydrogenase 
family, member 8 
ESTs, Weakly 
similar to PIHUB6 
salivary proline- 
rich protein 
precursor PRB1 
(large allele) 
ESTs 

DKFZp686D0521 

ecotropic viral 
integration site 2A 
synovial sarcoma 
translocation gene 
on chromosome 
18-like2 
synaptosomal- 
associated protein, 
23kDa 

serine protease 

HTRA3 

synaptopodin 

phafin 2 

ESTs, Weakly 

similar to 

PLLP_HUMAN 

Plasmolipin 

[H.sapiens] 

family with 

sequence 

similarity 3, 
member C 
golgi autoanrigen, 
golgin subfamily 
a, 3 

ADP-ribosylation 
factor-like 7 
CD58 antigen, 
(lymphocyte 
function- 
associated antigen 



FLJ12280 
ACAD8 



Xqll.2- NM_002444.1 72.5 
ql2 

3 AU147799 72.5 

llq25 BC001964.1 72.5 



AI885815 



DKFZp68 

6D0521 

EVI2A 

SS18L2 



H09663 
BF222895 



72.5 
68.6 



17qll.2 NMJH4210.1 6S.6 
3p21 NM_016305.1 68.6 



HTRA3 

KIAA102 
9 

FLJ13187 



FAM3C 



Above 

Below 

Above 
Above 



72.5 Above 



Above 
Below 

Below 

Above 



SNAP23 15ql4 BC003686.1 67.8 Below 



4pl6.1 AI828007 66.6 Above 

5q33.1 NM_007286.1 66.5 Above 

8q21.3 NM_024613.1 66.5 Above 

3 AW080832 66.5 Above 



7q22.1- NM_014888.1 65.3 Above 
q31.1 



ARL7 2q37.2 BC001051.1 65.3 
CD58 lpl3 NM_001779.1 65.3 

-144- 



Above 
Above 



65.6 

2.2 

12.5 
2.6 

36.6 



23.2 
6.2 

3.0 

1.6 



1.9 

3.8 
52.3 

3.1 

6.7 



4.6 



GOLGA3 12q24.33 NM_005895.1 65.3 Above 3.3 



3.2 
2.4 



BNSDOCID: <WO_ 



_030S3140A2_I_> 



WO 03/083140 



PCT/US03/08486 



3) 

53 211744_s_at CD58 antigen, CD58 lpl3 BC005930.1 65.3 Above 2.5 
(lymphocyte 
function- 
associated antigen 
3) 



54 


212552 at 


hippocalcin-like 1 


HPCAL1 


2p25.1 


BE617588 


65.3 


Below 


2.6 


55 


213358_at 


KIAA0SO2 


KIAA080 


18pll.21 


ABO 18345.1 


65.3 


Above 


12.7 






protein 


2 












56 


222699 s at 


phafin 2 


FLJ13187 


8q21.3 


BF439250 


65.3 


Above 


3.5 


57 


225618 at 


EST 




17 


AI7695S7 


65.3 


Below 


5.3 


58 


238778_at 


DKFZp451L157 


DKFZp45 


10 


AI244661 


65.3 


Above 


23.5 








1L157 












59 


239427 at 


ESTs 




1 


AA131524 


65.3 


Above 


13.7 


60 


47069_at 


Rho GTPase ARHGAP 


22ql3.31 


AA533284 


65.3 


Above 


3.3 






activating protein 8 

Q 












61 


205769_at 


solute carrier 


SLC27A2 


15q21.2 


NM_003645.1 


65.1 


Above 


56.0 






family 27 (fatty 
















acid transporter), 


















member 2 














62 


210786_s_at 


Friend leukemia 


flu 


llq24.1- 


M93255.1 


65.1 


Above 


2.2 






virus integration 1 




q24.3 










63 


2129S5_at 


DKFZp434E033 


DKFZp43 


4 


BF1 15739 


65.1 


Above 


7.1 








4E033 












64 


22744 l_s_at 


E2a-Pbxl- 


EB-1 


12 


AW005572 


65.1 


Above 


1139.4 






associated protein 














65 


23426 l_at 


DKFZp761M1012 DKFZp76 


12 


AL137313.1 


65.1 


Above 


960.8 






1 


1M10121 












66 


244565 at 


ESTs 




10 


AI6S5824 


65.1 


Above 


7.6 


67 


202181_at 


KIAA0247 gene 


KIAA024 


14q24.1 


NM_014734.1 


63.7 


Above 


1.8 






product 


7 












68 


202207_at 


ADP-ribosylation 


ARL7 


2q37.2 


NM_005737.2 


63.7 


Above 


3.2 






factor-like 7 














69 


20757 l_x_at 


basement 


ICB-1 


lp35.3 


NM 004848.1 


63.7 


Below 


4.4 






membrane- 
















induced gene 














70 


209558_s_at 


huntingtin 


HIP 12 


12q24 


AB0133S4.1 


61.1 


Above 


23.8 






interacting protein 
12 














71 


213005_s__at 


KIAA0172 


KIAA017 


9p24.3 


D79994.1 


61.1 


Above 


8.3 






protein 


2 












72 


236854_at 


cDNA 


DKFZp66 


20 


AA743694 


61.1 


Above 


12.6 






DKFZp667F0617 


7F0617 












73 


226233_at 


tubulin-specific 


TBCE 


lq42.3 


BG112197 


60.0 


Above 


2.6 






chaperone e 














74 


203435_s_at 


membrane 


MME 


3q25.1- 


NM_0072S7.1 


59.9 


Below 


2.2 






metallo- 




q25.2 














endopeptidase 


















(neutral 


















endopeptidase, 


















enkephalinase, 


















CALLA, CD10) 














75 


202478 at 


GS3955 protein 


GS3955 


2p25.1 


NM 021643.1 


59.3 


Above 


4.0 


76 


202479 s at 


GS3955 protein 


GS3955 


2p25.1 


BC002637.1 


59.3 


Above 


3.3 


77 


203999_at 


synaptotagmin I 


SYT1 


12cen- 


NM_005639.1 


59.3 


Above 


3.9 










q21 










78 


212149_at 


KIAA0143 


KIAA014 


8q24.12 


AA805651 


59.3 


Below 


13.5 



protein 3 

-145- 



BNSDOCID: <WO 03083 140A2J_> 



WO 03/083140 

79 212873 at minor 



80 218346_s_at 

81 224856_at 

82 20081 l_at 

83 201722 s at 



binding FKBP5 
CIRBP 



84 22371 l_s_at 

85 233273_at 

86 201460 at 



87 20242 l_at 

88 217983_s_at 

89 218087_s_at 

90 218491_s_at 

91 201S25_s_at 

92 202206_at 

93 218683_at 

94 226590_at 

95 227440_at 

96 229770_at 

97 40148 at 



98 212959_s_at 

99 203143_s_at 

100 209683 at 



HA-1 
histocompatibility 
antigen HA-1 
p53 regulated PA26 
PA26 nuclear 
protein 
FK506 
protein 5 

cold inducible 
RNA binding 
protein 

UDP-N-acetyl- 
alpha-D- 

galactosamine:pol 
ypeptide N- 
acetylgalactosami 
nyltransferase 1 
(GalNAc-Tl) 
HSPC144 protein 
cDNA FLJ12010 
fis 

mitogen-ac ti vated 
protein kinase- 
activated protein 
kinase 2 

immunoglobulin 
superfamily, 
member 3 
ribonuclease 6 
precursor 
sorbin and SH3 
domain containing 
1 

HSPC144 protein 
CGI-49 protein 



19pl3.3 
6q21 



6p21.3- 

21.2 

19pl3.3 



BE349017 

NM_014454.1 

AL122066.1 
NM 001280.1 



PCT/US03/08486 

593 Below 2.9 

59.3 Below 4.7 

59.3 Below 5.5 
59.1 Below 5.8 



GALNT1 18ql2.1 NM_020474.2 59.1 Below 1.8 



HSPC144 
FLJ12010 

MAPKAP 
K2 



IGSF3 



RNASE6P 
L 

SORBS 1 



ADP-ribosylation 
factor-like 7 
polyp yrimidine 
tract binding 
protein 2 

cDNA clone 
EUROIMAGE 
1517766 
E2a-Pbxl- 
associated protein 
hypothetical 
protein FLJ3 1978 
amyloid beta (A4) 
precursor protein- 
binding, family B, 
member 2 (Fe65- 
like) 

MGC4170 protein 
KIAA0040 gene 
product 
hypothetical 
protein 

DKFZp566A1524 



HSPC144 
LOC5109 
7 

ARL7 



PTBP2 



llq25 
1 

lq32 



lpl3 

6q27 

10q23.3- 
q24.1 

llq25 
lq44 

2q37.2 

lp22.11- 
p213 



AF182413.1 


59.1 


Above 


2.0 


AU146834 


59.1 


Above 


30.6 


AI141802 


57.9 


Above 


2.1 



AB007935.1 57.9 Above 



NM_003730.2 
NM_015385.1 

NM_0 14 174.1 
AL572542 

NM_005737.2 

NM 021190.1 



57.9 Below 
57.9 Above 



57.9 
57.8 

57.8 

57.8 



Above 
Above 

Above 

Above 



4.4 

3.4 
25.1 

1.4 
2.2 

3.9 

1.8 

3.1 



EB-1 

FLJ31978 

APBB2 



MGC4170 
KIAA004 
0 

DKFZP56 
6A1524 



12 

12q24.33 
4pl4 



12q23.1 
lq24-25 



AA031404 57.8 Above 
AW005572 57.8 Above 1168.9 



2p24.2 
-146- 



AI041543 


57.8 


Above 


51.8 


U62325 


57.8 


Above 


6.2 


AK001821.1 


57.2 


Below 


3.0 


T79953 


56.3 


Above 


2.4 


AA243659 


56.3 


Below 


10.0 



BNSDOC1D: <WO 030831 40A2J_> 



WO 03/083140 



PCT/US03/08486 



Table 65. Top 100 chi-square probe sets selected for Hyperdiploid >50 



U133 probe 
set 



Gene description Symbol 



HD 

Chromo- Chi- above/ 

somal square below Fold 

Location GenBankRef value mean change 



1 200600_at 

2 200737_at 

3 200980_s_at 

4 201136_at 

5 201807_at 

6 202214_s_at 

7 202557_at 

8 202593_s__at 

9 203680_at 

10 204194_at 

11 205324_s_at 

12 208598_s_at 

13 208861 s at 



14 211342_x_at 



Moesin 
(membrane- 
organizing 
extensio spike 
protein) 

Phosphoglycerate 
kinase 1 
Pyruvate 
dehydrogenase 
(lipoamide) alpha 
1 

Proteolipid protein 
2 (colonic 
epithelium- 
enriched) 
Vacuolar protein 
sorting 26 (yeast) 
Cullin4B 
Stress 70 protein 
chaperone, 
microsome 
associated, 60 kD 
membrane 
interacting protein 
ofRGS16 
Protein kinase, 
cAMP-dependent, 
regulatory, type II, 
beta 

BTB and CNC 
homology 1, basic 
leucine zipper 
transcription 
factor 1 

FtsJ homolog 1 
(E. coli) 
Upstream 
regulatory element 
binding protein 1 
Alpha 

thalassemia/menta 

I retardation 
syndrome X- 
linked (RAD54 
homolog, S. 
cerevisiae) 
trinucleotide 
repeat containing 

I I (THR- 
associated protein, 
230 kDa subunit) 



MSN 

PGK1 
PDHA1 



Xqll.2- NM_002444.1 34.0 Above 
ql2 



1.9 



Xql3 NM_000291.1 34.0 Above 1.8 

Xp22.2- NM_000284.1 34.0 Above 1.7 
p22.1 



PLP2 Xpll.23 NM_002668.1 34.0 Above 3.3 



VPS26 

CUL4B 
STCH 



MIR16 



PRKAR2 
B 



10q21.1 NM_004896.1 34.0 Above 1.7 

Xq23 NM_0035S8.1 34.0 Above L9 
21qll AI718418 34.0 Above 2.0 



16pl2- NM_016641.1 34.0 Below 1.6 
pll.2 

7q22- NM_002736.1 34.0 Above 3.3 
q31.1 



BACH1 21q22.11 NM__001 186.1 34.0 Above 1.8 



FTSJ1 
UREB1 

ATRX 



Xpll.23 NM_012280.1 34.0 Above 
Xpll.22 NM_005703.2 34.0 Above 



Xql3.1- U72937.2 
q21.1 



34.0 Above 



2.1 
1.6 

1.7 



TNRC11 Xql3 BC004354.1 34.0 Above 1.8 



-147- 



WO 03/(183140 

15 216071_x_at 

16 218573_at 

17 219485_s_at 

18 20065 5_s_at 

19 20073 S_s_at 

20 200944 s at 



21 201092_at 

22 201100_s_at 

23 201688_s_at 

24 201899_s_at 

25 202325_s_at 

26 202829_s_at 

27 202854_at 

28 206846_s_at 

29 209370_s_at 

30 209565_at 

31 212846_at 

32 217356_s_at 

33 218163_at 

34 218386 x at 



Trinucleotide 
repeat containing 
11 

APR-1 
protein/melanoma 
-associated 
antigen 
proteasome 
(prosome, 
macropain) 26S 
subirnit, non- 
ATPase, 10 
Calmodulin 1 
(phosphorylase 
kinase, delta) 
Phosphoglycerate 
kinase 1 
High-mobility 
group (nonhistone 
chromosomal) 
protein 14; 
member of the 
HMG 14/17 
family 

Retinoblastoma 
binding protein 
7/RbAp46 
Ubiquitin specific 
protease 9 
Tumor protein 
D52 

Ubiquitin- 
conjugating 
enzyme E2A 
(RAD6 homolog) 
ATP synthase, H+ 
transporting, 
mitochondrial F0 
complex, subunit 
F6 

Synaptobrevin- 
like 1 

Hypoxanthine 
phosphoribosyltra 
nsferase 1 (Lesch- 
Nyhan syndrome) 
Histone 
deacetylase 6 
SH3 -domain 
binding protein 2 
zinc finger protein 
183 

KIAA0179 
protein. 

Phosphoglycerate 
kinase 

MCT-1 protein 
Ubiquitin specific 
protease 16; de- 



TNRC11 
MAGEH1 



Xql3 
Xpll.22 



AF132033 
NM 014061.1 



PCT/US03/08486 

34.0 Above 1.8 

34.0 Above 3.0 



PSMD10 Xq22.3 NMJ)02814.1 34.0 Above 2.4 



CALM1 

PGK1 
HMG14 



14q24- 
q31 

Xql3 

21q22.2 



NM 006888.1 30.1 Above 1.7 



NM_000291.1 
NM 004965.1 



30.1 Above 1.8 
30.1 Above 1.7 



RBBP7 Xp22.31 NM_002893.2 30.1 Above 1.6 



USP9X 
TPD52 
UBE2A 



Xpll.4 

8q21 

Xq24- 
q25 



NM_004652.2 
BE974098 
NM 003336.1 



30.1 Above 1.7 
30.1 Below 4.1 
30.1 Above 1.8 



ATP5J 21q21.1 NMJ)016S5.1 30.1 Above 1.6 



SYBL1 


Xq28 


NM_005638.1 


30.1 


Above 


1.5 


HPRT1 


Xq26.1 


NM_000194.1 


30.1 


Above 


1.4 


HDAC6 


Xpll.23 


NM_006044.2 


30.1 


Above 


1.5 


SH3BP2 


4pl6.3 


AB000462.1 


30.1 


Above 


3.1 


ZNF183 


Xq25- 
q26 

21q22.3 


BC000832.1 


30.1 


Above 


2.2 


KIAA017 


D80001.1 


30.1 


Above 


2.0 


9 

PGK1 


Xql3 


S81916.1 


30.1 


Above 


1.8 


MCT-1 
USP16 


Xq22-24 
21q22.11 


NM 014060.1 
NM_006447.1 


30.1 
30.1 


Above 
Above 


1.8 
1.7 



-148- 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083140 



- - PCT/US03/08486 



ubiquitinates 
histone H2A; 
ubiquitous 
expression. 
35 21 8402_s_at Hermansky- 

Pudlak syndrome 



36 218495_at 

37 218499 at 



Ubiquitously- 
expressed 
transcript 
Mst3 and SOK1- 
related 

kinase/STE20-iike 
kinase; contains a 
Ser/Thr protein 
kinase domain 
38 218757_s_at Similar to yeast 
Upf3, variant B 



HPS4 
UXT 
MST4 



NM 022081.1 30.1 Below 



Xpll.23- NM_004 182.1 30.1 
pi 1.22 



Above 



UPF3B 



NM 023010.1 30.1 



39 219038_at 

40 229967 at 



41 242794_at 

42 201132 at 



Hypothetical 
protein FLJ11565 
Chemokine-like 
factor super 
family 2. 
EST 

Heterogeneous 
nuclear 

ribonucleoprotein 
H2(H') 

43 201312_s_at SH3 domain 

binding glutamic 
acid-rich protein 
like 

44 20 1 S94_s_at Decorin; 

glycoprotein that 
binds to type I 
collagen fibrils & 
plays a role in 
matrix assembly. 
Peroxiredoxin 4 
Hypothetical 
protein FLJ21174 



Xq25- 
q26 

FLJ11565 Xq22.2 NM_024657.1 30.1 

CKLFSF2 16q23.1 AA778552 30.1 



4q31.1 
HNRPH2 Xq22 



AI569476 30.1 
NM 019597.1 30.0 



SH3BGR Xql3.3 NM_003022.1 30.0 
L 



Above 
Above 
Above 



Above 
Above 



Above 



45 201923_at 

46 20237 l_at 

47 203126_at 



4)- 



monophosphatase 
2 

48 204219 s at proteasome 

(prosome, 
macropain) 26S 
subunit, ATPase, 
1 

49 204835_at polymerase (DNA 

directed), alpha 

50 2 1 207 l_s_at Spectrin, beta, 

non-erythrocytic 1 

51 212419_at EST 

52 2 1 27 1 S_at Hypothetical 

protein MGC5370 

53 21 3 502_x_at Homo sapiens 

cDNA FLJ323 1 3 



3.4 



1.5 



Xq26.1 NM_016542.1 30.1 Above 2.5 



2.3 
6.9 
4.3 



3.2 
2.0 



1.6 



DCN 


12ql3.2 


NM_001920.1 


30.0 


Above 


1.5 


PRDX4 
FLJ21174 


Xp22.13 
Xq22.1 


NM 006406.1 
NM_024863.1 


30.0 
30.0 


Above 
Above 


1.9 
3.6 


IMPA2 


18pll.2 


NM_014214.1 


30.0 


Above 


4.1 


PSMC1 


19pl3.3 


NM_002802.1 


30.0 


Above 


1.3 


POLA 
SPTBN1 


Xp22.1- 

p21.3 

2p21 


NMJH6937.1 
BE968833 


30.0 
30.0 


Above 
Below 


2.0 
1.7 


MGC5378 


10q22.3 
14q32.2 


AL049949.1 
BG1 10231 


30.0 
30.0 


Above 
Above 


13.1 
1.5 


FLJ32313 


22qll.23 


X03529 


30.0 


Below 


1.8 



-149- 



BNSDOCID: <WO. 



03083 140A2_L> 



WO 03/083140 



PCT/US03/08486 



54 214051_at 

55 226039_at 

56 227279_at 

57 200642_at 

58 200799_at 

59 200943 at 



60 201018_at 

61 201311_s_at 

62 201443_s_at 

63 201472_at 

64 201689_s_at 

65 202602_s_* t 

66 203041_s_at 

67 203102__s_at 

68 203744 at 



fis, clone 
PROST2 003232, 
weakly similar to 
BETA- 

GLUCURONIDA 
SE PRECURSOR 
(EC 3.2.131) 
Thymosin, beta 

Mannosyl (alpha- 

l,3)-glycoprotein 

beta-l,4-N- 

acelylglucosaminy 

Itransferase 

hypothetical 

protein 

MGC15737 

Superoxide 

dismutase 1, 

soluble 

Heat shock 70kD 
protein 1A 
High-mobility 
group (nonhistone 
chromosomal) 
protein 14; 
member of the 
HMG 14/17 
family 
Eukaryotic 
translation 
initiation factor 
1A 

SH3 domain 
binding glutamic 
acid-rich protein 
like 

ATPase, H+ 
transporting, 
lysosomal 
interacting protein 
2 

Von Hippel- 
Lindau binding 
protein 1 
Tumor protein 
D52 

HIV TAT specific 
factor 1 
Lysosomal- 
associated 
membrane protein 
2 

Mannosyl (alpha- 

1 ,6-)-glycoprotein 

beta-l,2-N- 

acetylglucosarniny 

Itransferase 

High-mobility 



TMSNB 



MGAT4A 



MGC1573 
7 

SOD1 



HSPA1A 
HMG 14 



Xq21.33- 

q22.3 

2qll.2 



BF677486 30.0 
AW006441 30.0 



Xq22.1 AA847654 30.0 
21q22.11 



NM 000454.1 26.7 



6p21.3 NM_005345.3 26.7 
21q22.2 NM_004965.1 26.7 



SH3BGR 
L 



ATP6IP2 Xq21 



VBP1 

TPD52 

HTATSF1 

LAMP2 



Xq28 



8q21 

Xq26.1- 

q27.2 

Xq24 



Above 
Above 

Above 

Above 

Above 
Above 



EIF1A Xp22.12 BE542684 26.7 Above 



Xql3.3 AL515318 26.7 Above 



AF248966.1 26.7 Above 

NM_003372.2 26.7 Above 

BE974098 26.7 Below 

NMJ) 14500.1 26.7 Above 

J04183.1 26.7 Above 



MGAT2 14q21 NM_002408.2 26.7 Above 



3.1 
3.0 

5.6 

2.3 

2.7 
1.6 



HMG4 



Xq28 
-150- 



ISnvI 005342.1 26.7 Above 



1.8 
1.6 
1.9 

1.7 

4.3 
1.5 
3.1 

1.6 
1.9 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/08314(1 



-PCT/US03/08486 



69 2055 IS s at 



70 208683_at 

71 209440_at 

72 2107S6_s_at 

73 212070_at 

74 213334_x_at 

75 215117_at 

76 218694_at 

77 22274 l_s_at 

78 223082_at 

79 225105_at 

80 225406_at 

81 225553_at 

82 226199_at 

83 226875_at 

84 232974__at 

85 46323 at 



group (nonhistone 

chromosomal) 

protein 4 

Cytidine 

monophosphate- 

N- 

acetylneuraminic 
acid hydroxylase 
(CMP-N- 
acetylneuraminate 
monooxygenase) 
Calpain 2, (m/II) 
large subunit; 
calcium- 
dependent Cys 
protease. 
Phosphoribosyl 
pyrophosphate 
synthetase 1; 
purine 

biosynthesis. 
Friend leukemia 
virus integration 1 
G protein-coupled 
receptor 56 
Three prime repair 
exonuclease 2 
Recombination 
activating gene 2; 
V(D)J 

recombinase. 
ALEX1 protein 

hypothetical 
protein FLJ1 1101 
SH3-dornain 
kinase binding 
protein 1 

clone MGC:23936 
IMAGE:3838595, 
mRNA, complete 
cds 

Twisted 
gastrulation 
Homo sapiens 
cDNA FLJ12874 
fis 

Hypothetical 

protein 

MGC23937 

Hypothetical 

protein FLJ32 122 

cDNA FL J 12417 

fis 

SCAN-1 Ca-H-- 
dependent ER 
nucleoside 
diphosphatase/apy 
rase 



CMAH 


6p22-p23 


NM_003570.1 


26.7 


Below 


2.9 


CAPN2 


Iq41~q42 


M23254.1 


26.7 


Above 


2.2 


PRPS1 


Xq21- 


BC001605.1 


26.7 


Above 


1.4 




q27 










flu 


llq24.1- 


M93255.1 


26.7 


Below 


2.5 




q24.3 








2.4 


GPR56 


16ql3 


AL554008 


26.7 


Above 


TREX2 


Xq28 


BE676218 


26.7 


Above 


1.7 


RAG2 


llpl3 


AW058148 


26.7 


Below 


27.2 


ALEX1 


Xq21.33- 


NM_0 16608.1 


26.7 


Above 


2.8 




q22.2 








1.5 


FLJ11101 


6p21.1 


AI761426 


26.7 


Above 


SH3KBP1 


Xp22.1- 


AF230904.1 


26.7 


Above 


2.0 




p21.3 












12q23.3 


BF969397 


26.7 


Above 


2.1 


TSG 


18pll.3 


AA195009 


26.7 


Above 


1.9 




14q22.2 


AL042817 


26.7 


Above 


1.6 


MGC2393 
7 


Xql3.1 


AL563795 


zo. / 


Above 




FLJ32122 


Xq24 


AI742838 


26.7 


Above 


2.3 




Xp22.31 


AU148256 


26.7 


Above 


3.1 


SHAPY 


17q25.3 


AL120741 . 


26.7 


Above 


1.7 



-151- 



BNSDOCID: <WO. 



.030831 40A2J_> 



WO 03/083140 

86 203694 s_at 



87 200658_s_at 

88 201898__s_at 



89 203556_at 

90 203745_at 

91 203909_at 

92 204446_s_at 

93 205191_at 

94 206874_s_at 

95 208073_x_at 

96 209056_s_at 

97 210645_s_at 

98 215773_x_at 

99 215884_s_at 

100 217954 s at 



DEAD/H (Asp- 
Glu-Ala-Asp/His) 
box polypeptide 
16 

Proliibitin 
ubiquitin- 
conjugating 
enzyme E2A 
(RAD6 homolog) 
KIAA0854 
protein 

Holocytochrome c 
synthase 
(cytochrome c 
heme-lyase) 
Solute carrier 
family 9 

(sodium/hydrogen 
exchanger), 
isoform 6 
Arachidonate 5- 
lipoxygenase 
Retinitis 

pigmentosa 2 (X- 
linked recessive) 
Ste20-related 
serine/threonine 
kinase 

Tetratricopeptide 
repeat domain 3 
CDC5 cell 
division cycle 5- 
like (S. pombe) 
Tetratricopeptide 
repeat domain 3 
ADP- 

ribosyltransferase 
(NAD+; 

poly(ADP-ribose) 
polymerase)-like 2 
Ubiquilin 2 

PHD finger 
protein 3 



DDX16 6p21.3 



PHB 
UBE2A 



KIAA085 
4 

HCCS 



17q21 
Xq24- 
q25 



8q24.13 
Xp22.3 



PCT/US03/08486 

NM 003587.2 263 Above 1.3 



AL560017 26.3 Above 2.0 
All 26625 26.3 Above 1.6 



NM_014943.1 263 Below 1.6 
AI801013 263 Above 2.1 



SLC9A6 Xq26.3 NM_006359.1 263 Above 1.9 



ALOX5 
RP2 

SLK 

TTC3 
CDC5L 

TTC3 
ADPRTL2 

UBQLN2 
PHF3 



10qll.2 

Xpll.4- 
pll.21 



NM_000698.1 263 Above 4.2 
NM 006915.1 263 Above 2.1 



10q25.1 AL138761 26.3 Above 



21q22.2 
6p21 



Xpll.23- 
pll.l 

6 



1.6 



NM_003316.1 263 Above 1.9 
AW268817 263 Above 1.4 



21q22.2 D83077.1 263 Above 2.2 

14qll.2- AJ236912.1 263 Above 1.6 
ql2 



AK001029.1 263 Above 1.9 
NM 015153.1 263 Above 1.5 



Ta ble 66, Top 100 chi-square probe sets selected for MLL 

— MLL 

Chromo- Chi- above/ 

somal square below Fold 

Location GenBankRef value mean change 



U133 probe 
set 



Description Symbol 



1 202603_at 

2 219463_at 



3 224772_at 

4 204069_at 



a disintegrin and AD AMI 0 1 5q22 
metalloproteinase 
domain 10 

chromosome 20 C20orfl03 20pl2 
open reading 
frame 103 

neuron navigator 1 NAV1 
Meis 1 , myeloid MEIS 1 



N51370 



44.6 



NM 012261.1 44.6 



AB032977.1 44.6 
2pl4-pl3 NM_002398.1 44.4 
-152- 



Above 
Above 



Below 
Above 



1.8 
24.7 



3.8 
73.7 



BNSDOCID: <WO 030831 40A2_I_> 



WO 03/083140 



PCT/US03/08486 _ 



5 218966_at 

6 226939_at 

7 204446_s_at 

8 206492_at 

9 212588_at 

10 215925_s_at 

11 211733_x_at 

12 212386_at 

13 218764_at 

14 218847_at 

15 222409_at 

16 242172_at 

17 201153_s_at 

18 210487_at 

19 219686_at 

20 22698 l_at 

21 203375_s_at 

22 221676_s_at 

23 201152_s_at 

24 221773_at 

25 201162__at 

26 201163_s_at 

27 203836_s_at 

28 203837 at 



ecotropic viral 
integration site 1 
homolog 
myosin 5C 
cDNA FLJ37247 
fis 

arachidonate 5- 
lipoxygenase 
fragile histidine 
triad gene 
protein tyrosine 
phosphatase, 
receptor type, C 
CD72 antigen 
(ligand for CD5) 
sterol carrier 
protein 2 
cDNA FLJ1191S 
fis 

Protein Kinase C 
eta isoform. 
IGF-II mRNA- 
binding protein 2 
coronin, actin 
binding protein, 
1C 

ESTs 

muscleblind-like 
(Drosophila) 
deoxynucleotidyltr 
ansferase, terminal 
gene for 
serine/threonine 
protein kinase 
Homo sapiens, 
clone 

IMAGE:4401491, 
mRNA 
tripeptidyl 
peptidase II 
coronin, actin 
binding protein, 
1C 

muscleblind-like 
(Drosophila) 
ELK3, ETS- 
domain protein 
(SRF accessory 
protein 2) 
insulin-like 
growth factor 
binding protein 7 
insulin-like 
growth factor 
binding protein 7 
mitogen-activated 
protein kinase 
kinase kinase 5 
mitogen-activated 



MY05C 
FLJ37247 

ALOX5 

FHIT 

PTPRC 

CD72 

SCP2 

FLJ11918 

PRKCH 

IMP-2 

COROIC 

MBNL 

DNTT 

HSA2508 
39 



TPP2 
COROIC 

MBNL 
ELK3 

IGFBP7 
IGFBP7 
MAP3K5 
MAP3K5 



15q21 


NMJ)18728.1 
AI202327 


44.4 

AAA 

44.4 


Below 
Above 


4.5 
6 9 


10qll.2 


NM_000698.1 


40.7 


Below 


66.8 


3pl4.2 


NM_002012.1 


40.7 


Below 


36.6 


1n31-n32 


AI809341 


40.7 


Above 


2.3 


9pll.2 


AF283777.2 


40.7 


Above 


3.0 


lp32 


BC005911.1 


40.1 


Above 


1.5 




AK02 1980.1 


40.1 


Below 


D. 1 


14q22.1- 

q22.3 

3q28 


NM_024064.1 
NM_006548.1 


40.1 
40.1 


Below 
Above 


7.6 
23.2 


12q24.1 


AL1 62070.1 


40.1 


Above 


4.8 


3q25 


N50406 
NMJ)21038.1 


40.1 
40.0 


Above 
Above 


33.6 
2.1 


10q23- 

q24 

4pl6.2 


Ml 1722.1 


40.0 


Below 


2.9 


NM 018401.1 


40.0 


Below 


28.3 




AW002079 


37.4 


Below 


1 A 

1.0 


13q32- 
q33 

12q24.1 


NM_003291.1 
BC002342.1 


37.2 
37.2 


Above 
Above 


1.6 
3.5 


3q25 


NM_021038.1 


36.2 


Above 


Z.JL 


12q23 


AW575374 


36.2 


Below 


8.2 


4ql2 


NM_001553.1 


36.0 


Above 


A ^ 

4. j 


4ql2 


NM_001553.1 


36.0 


Above 


4.0 


6q22.33 


D84476.1 


36.0 


Above 


13.9 


6q22.33 


NM_005923.2 


36.0 


Above 


4.2 



-153- 



BNSDOCID: <WO_ 



_03083140A2J_> 



WO 03/083140 



PCT/US03/08486 



29 213S91_s_at 

30 214895_s_at 

31 226415_at 

32 235879_at 

33 212387_at 

34 2189S8_at 

35 228555 at 



36 202975_s_at 

37 201105_at 

38 203434 s at 



39 212135_s_at 

40 212136_at 

41 230179_at 

42 218217_at 

43 22584 l_at 

44 226668 at 



protein kinase 
kinase kinase 5 
cDNA FLJ11918 
fis 

a disintegrin and 

metalloproteinase 

domain 10 

KIAA1576 

protein 

ESTs 

cDNA FLJ1 1918 
fis 

bladder cancer 
overexpressed 
protein 

EST; by BLAT 
calcium/calmoduli 
n-dependent 
Protine Kinase 
type II Delta chain 
(CAMK GROUP 
I) 

Rho-related BTB 
domain containing 
3 

lectin, galactoside- 

binding, soluble, 1 

(galectin 1) 

membrane 

metallo- , 

endopeptidase 

(neutral 

endopeptidase, 

enkephalinase, 

CALL A, CD 10) 

calcium 

transporting 

ATPase plasma 

membrane 

protein. 

calcium 

transporting 

ATPase plasma 

membrane 

protein. 

cDNA 

DKFZp547P158 
likely homolog of 
rat and mouse 
retinoid-inducible 
serine 

carboxypeptidase 
hypothetical 
protein FLJ30525 
Homo sapiens, 
similar to WD 
domain, G-beta 
repeat containing 
protein 



FLJ11918 




AI927067 


DO.U 


B elow 


3.2 


ADAM10 


15q22 


AU135154 


36.0 


Above 


1.9 


KIAA157 


16q22.1 


AA156723 




Above 


40 7 


6 










3.8 






AI697540 


36.0 


Above 


FLJ11918 




AK021980.1 


35.8 


Below 


3.3 


BLOV1 


12ql5 


■v TTV r r\-\ o /TC/C 1 

NM_018656.1 


1^ Q 

3j.o 


Below 




CAMK2D 




AA029441 


35.8 


Above 


3.1 


RHOBTB 


5q21.2 


N21 138 




Above 


5 5 


3 












LGALS1 


22ql3.1 


NM_002305.2 


34.5 


Above 


14.5 


MME 


3q25.1- 


AI433463 


34.1 


Below 


31.2 




q25.2 










ATP2B4 




AW5 17686 


34.1 


Below 


2.4 


ATP2B4 




AW517686 


1A 1 


x>eiow 


2.1 


DKFZp54 




N52572 


34.1 


Below 


6.4 


7P158 










3.4 


RISC 


17q23.2 


NM_02l626.l 


32.8 


Above 


FLJ30525 


lpl3.2 


BE502436 


32.8 


Above 


1.8 






W80623 


32.8 


Above 


2.4 



-154- 



BNSDOCID: <WO 030831 40A2_I_> 



WO 03/083 140 
45 2009S9 at 



PGT/US03/08486 



46 201151_s_at 

47 201563_at 

48 203753_at 

49 205668_at 

50 20647 l_s_at 

51 211302_s_at 

52 212012_at 

53 212063_at 

54 213241_at 

55 21465 l_s_at 

56 218140_x_at 

57 21998S_s_at 

58 223046_at 

59 224150_s_at 

60 224933_s_at 

61 201078_at 

62 205550 s at 



63 212382__at 

64 225019_at 

65 225202_at 

66 228855_at 

67 231899_at 

68 52164 at 



hypoxia-inducible 
factor 1, alpha 
subunit (basic 
helix-loop-helix 
transcription 
factor) 

muscleblind-like 
(Drosophila) 
sorbitol 
dehydrogenase 
transcription 
factor 4 
lymphocyte 
antigen 75 
plexin CI 
phosphodiesterase 
4B, cAMP- 
specific 
Melanoma 
associated gene 
CD44 antigen 
PLEXIN cl 
homeo box A9 
APMCF1 protein 
hypothetical 
protein FLJ10597 
egl nine homolog 
1 (C. elegans) 
plO-binding 
protein 
hypothetical 
protein 

DKFZp761F0118 
transmembrane 9 
superfamily 
member 2 
brain and 
reproductive 
organ-expressed 
(TNFRSF1A 
modulator) 
cDNA FLJ11918 
fis 

calcium/calmoduli 
n-dependent 
protein kinase 
(CaM kinase) II 
delta 

Rho-related BTB 
domain containing 
3 

nudix (nucleoside 
diphosphate 
linked moiety X)- 
type motif 7 
KIAA1726 
protein 

chromosome 1 1 
open reading 



HEF1A 


14q21- 


NM_00 1530.1 


32.2 


Below 


1.8 




q24 












3q25 


NM 021038.1 


32.2 


Above 


2.6 




15ql5.3 


L29008.1 


32.2 


Above 


1.8 


TCF4 


18q21.1 


NM_003 199.1 


32.2 


Below 


2.9 


LY75 


2q24 


NM_002349.1 


32.2 


Above 


2.1 


PLXNC1 


12q23.3 


NM 005761.1 


32.2 


Above 


/. / 


PDE4B 


lp31 


L20966.1 


32.2 


Below 


3.0 


D2S448 


2pter- 


AF200348.1 


32.2 


Below 


2.4 




p25.1 








3.1 




llpl3 


BE903880 


32.2 


Above 


PLXNC1 


AF035307.1 


32.2 


Above 


2.5 


HOXA9 


7r>15-t)14 


U41813.1 


32.2 


Above 


28.5 


APMCF1 


3q22.2 


NM 021203.1 


32.2 


Above 


1.4 


FLJ10597 


lp34.1 


NM_01S150.1 


32.2 


Above 


1.9 


EGLN1 


lq42.1 


NM_02205 1 . 1 


32. Z 


Below 




BITE 


3q22-q23 


AF289495.1 


32.2 


Above 


2.1 


DKFZp76 


10q22.1 


AB037801.1 


32.2 


Above 


1.9 


1F0118 












TM9SF2 


13q32.3 


NM_004800.1 


32.0 


Above 


1.5 


BRE 


2p23.3 


NM_004899.1 


32.0 


Above 


2.0 


FLJ11918 




AK021980.1 


32.0 


Below 


2.7 


CAMK2D 


4q25 


AA777512 


32.0 


Above 


3.6 



RHOBTB 
3 

NUDT7 



KIAA172 
6 

Cllorf24 



5q21.2 BE620739 



AI927964 



llq23.1 
llq!3 



AB051513.1 
AA065185 



32.0 Above 5.5 

32.0 Above 5.6 

32.0 Above 33.0 

32.0 Above 2.3 



-155- 



BNSDOCID: <WO 



.030831 40A2_I_> 



WO 03/083140 



PCT/US03/08486 



69 212660_at 

70 213513_x_at 

71 222603_at 

72 23855S_at 

73 20239 l_at 

74 202604_x_at 

75 203435 s at 



76 204445_s_at 

77 209705 at 



78 214366_s_at 

79 215000_s_at 

80 220643_s_at 

81 226459 at 



82 238712_at 

83 229686_at 

84 222620_s_at 

85 224516 s at 



frame 24 
KIAA0239 
protein 
actin related 
protein 2/3 
complex, subunit 
2, 34kDa 
hypothetical 
protein FLJ23309 
ESTs 

brain abundant, 

membrane 

attached signal 

protein 1 

a disintegrin and 

metalloproteinase 

domain 10 

membrane 

metallo- 

endopeptidase 

(neutral 

endopeptidase, 

enkephalinase, 

CALLA,CD10) 

arachidonate 5- 

lipoxygenase 

likely ortliolog of 

mouse metal 

response element 

binding 

transcription 

factor 2 

arachidonate 5- 
lipoxygenase 
fasciculation and 
elongation protein 
zeta 2 (zygin II) 
Fas apoptotic 
inhibitory 
molecule 
Homo sapiens 
gastric cancer- 
related protein 
GCYS-20 (gcys- 
20) mRNA, 
complete cds; 
homology with 
mouse epidermal 
growth factor 
receptor pathway 
substrate 8 
ESTs 

cDNA FLJ35637 
fis 

hypothetical 
protein similar to 
mouse Dnajll 
hypothetical 
protein HSPC195 



KIAA023 
9 

ARPC2 



5q31.1 
2q36.1 



FLJ23309 9p24 



BASP1 



5pl5.1- 
pl4 



AI735639 
BG034239 

AL136980 

AI445833 
NM 006317.1 



31.7 Below 1.7 
31.7 Above 1.3 



31.7 Above 



31.7 
31.3 



MME 



ALOX5 
M96 



ALOX5 
FEZ2 

FAJQvI 



FLJ35637 
DNAJL1 

HSPC195 



3q25.1- 
q25.2 



10qll.2 
lp22.1 



AI361S50 
AF073293.1 



31.3 
31.3 



10qll.2 
2p21 

3q23 



AA995910 
AL1 17593.1 



31.3 
31.3 



AW575754 



Above 
Above 



ADAM 10 15q22 NM_001 110.1 31.3 Above 



NM 007287.1 31.3 Below 



Below 
Above 



NM 018147.1 31.3 Above 



31.3 Above 



10pll.23 
5q31.3 

-156- 



3.6 

3.8 
2.1 



1.8 



54.8 



Below 6S7.0 
Below . 1.5 



54.7 
1.7 

2.9 

1.6 



BF801735 


31.3 


Above 


2.7 


AI436587 


31.0 


Below 


1.5 


BF591419 


29.8 


Above 


2.4 


BC006428.1 


29.8 


Above 


2.7 



BNSDOCID: <WO_ 



_03083140A2J_> 



WO 03/083140 



PCT/US03/08486 



86 203217 s at 



87 204030 s at 



88 209191_at 

89 213541 s at 



90 213773_x_at 

91 219243_at 

92 219256_s_at 

93 223358_s_at 

94 224796_at 

95 203076_s_at 

96 212385_at 

97 216026_s_at 

98 217118_s_at 

99 219821_s__at 

100 201875 s at 



sialyltransferase 9 
(CMP- 

NeuAc.lactosylcer 
amide alpha-2,3- 
sialyltransferase; 
GM3 synthase) 
schwannoniin 
interacting protein 
1 

tubulin beta-5 
v-ets 

erythroblastosis 
virus E26 
oncogene like 
(avian) 

Williams Beuren 

syndrome 

chromosome 

region 20A 

immunity 

associated protein 

4 

hypothetical 
protein FLJ20356 
phosphodiesterase 
7A 

development and 
differentiation 
enhancing factor 1 
MAD, mothers 
against 

decapentaplegie 
homolog 2 
(Drosophila) 
cDNA FLJ1 1918 
fis 

polymerase (DNA 
directed), epsilon 
KIAA0930 
protein 
hypothetical 
protein FLJ20330 
hypothetical 
protein FLJ21047 



SIAT9 2pll.2 NM_003S96.1 28.8 Below 2.1 



SCHIP1 3q25.32 NMJH4575.1 28.8 Below 17.6 



TUBB-5 
ERG 



WBSCR2 
OA 



BC002654.1 28.8 Above 6.4 
21q22.3 AI351043 28.8 Below 2.8 



7qll.23 AW248552 28.8 Above 1.3 



HIMAP4 


7q35 


"TvTJV/f HI C*^0£ 1 

JNJV1_U 1 ojZO. I 


Zo.O 




13.4 


FLJ20356 


4pl6.1 


NM_01 8986.1 


28.8 


Below 


2.6 


PDE7A 


8ql3 


AW269834 


28.8 


Above 


1.5 


DDEF1 


8q24.1- 


W03103 


28.8 


Below 


1.8 




q24.2 










MADH2 


18q21.1 


U65019.1 


28.7 


Below 


2.0 


FLJ11918 




AK02 1980.1 


28.7 


Below 


3.2 


POLE 


12q24.3 


AL080203.1 


28.7 


Below 


3.0 


KIAA093 


22ql3.31 


AK025608.1 


28.7 


Above 


1.9 


0 

FLJ20330 


6pter- 


NM_01S988.1 


28.7 


Below 


5.5 




p22.1 








2.0 


FLJ21047 


lq23.2 


NM_024569.1 


28.5 


Above 



Table 67. Top 100 chi-square probe sets selected for T-ALL 

~ T-ALL 

Chromo- above/ 
somal Chi- below Fold 

Location GenBankRef square mean change 



U133 probe 

set Gene Description Symbol 



201137 s at 



202113 s_at 



major HLA- 
histocompatibility DPB1 
complex, class II, 
DP beta 1 

sorting nexin 2 SNX2 



6p21.3 NM_002121.1 100.0 Below 21.0 



5q23 AF043453.1 100.0 Below 
-157- 



4.2 



BNSDOCID: <WO 030831 40A2_L> 



WO 03/083140 



PCT/US03/0S486 



3 2021 14_at sorting nexin 2 SNX2 

4 203675 at nucleobindin 2 NUCB2 



5 204670_x_at 

6 205297_s__at 

7 20545 6_at 



8 206398_s_at 

9 20S306 x at 



10 208894 at 



11 209312 x at 



12 209619 at 



13 210116_at 

14 210982_s_at 

15 211990_at 

16 211991_s_at 

17 213539_at 

18 214049 x at 



major 

histocompatibility 
complex, class II, 
DR beta 3 
CD79B antigen 
(immunoglobulin- 
associated beta) 
CD3E antigen, 
epsilon 

polypeptide (TiT3 
complex) 
CD 19 antigen 
major 

liistocompatibility 
complex, class II, 
DR beta 4 
major 

histocompatibility 
complex, class II, 
DR alpha 
major 

histocompatibility 
complex, class II, 
DR beta 1 
CD74 antigen 
(invariant 
polypeptide of 
major 

histocompatibility 
complex, class II 
antigen- 
associated) 
SH2 domain 
protein 1A, 
Duncan's disease 
(lymphoproliferati 
ve syndrome) 
major 

histocompatibility 
complex, class II, 
DR alpha 
major 

histocompatibility 
complex, class II, 
DP alpha 1 
major 

Instocompatibility 
complex, class II, 
DP alpha 1 
CD3D antigen, 
delta polypeptide 
(TiT3 complex) 
CD7 antigen (p41) 



HLA- 
DRB3 



5q23 NM_003 100.1 100.0 Below 

llpl5.1- NM_005013.1 100.0 Above 
pl4 

6p21.3 NMJ)02125.1 100.0 Below 



4.6 
3.6 

13.4 



CD79B 17q23 NMJ300626.1 100.0 Below 23.3 



CD3E Hq23 NM_000733.1 100.0 Above 20.7 



CD19 
HLA- 
DRB4 



HLA- 
DRA 



HLA- 
DRB1 



CD74 



HLA- 
DRA 



HLA- 
DPA1 



HLA- 
DPA1 



CD3D 
CD7 



16pll.2 NM_001770.1 100.0 Below 5693.6 
6p21.3 NM_02 1983.2 100.0 Below 8.3 



6p21.3 M60334.1 100.0 Below 20.9 



6p21.3 U65585.1 100.0 Below 12.6 



5q32 K01 144.1 100.0 Below 15.1 



SH2D1A 



19 214551_s_at CD7 antigen (p41) CD7 



Xq25- AF072930.1 100.0 Above 150.7 
q26 



6p21.3 M60333.1 100.0 Below 23.4 



6p21.3 M27487.1 100.0 Below 19.6 



6p21.3 M27487.1 100.0 Below 24.5 



llq23 NM„000732.1 100.0 Above 35.7 



17q25.2- AI829961 100.0 Above 312.2 

q25.3 

17q25.2- NM_006137.2 100.0 Above 228.1 
q25.3 

-158- 



BNSDOCID: <WO_ 



_03083140A2_I_> 



WO 03/083140 



PCT/US03/0848f> 



20 


217147js_at 


T-cell receptor 


TRIM 






interacting 








molecule 




21 


217478_s_at 


MHC, class Ha, 


HLA- 






HLA-DMA 


DMA 


22 


221969_at 


paired box gene 5 


PAX5 






(B-celi lineage 








specific activator 








protein) 




23 


227646 at 


early B-cell factor 


EBF 


24 


229487_at 


cDNA FLJ39389 


FLJ39389 






fis 




25 


22983S_at 


cDNA FLJ39156 


FLJ39156 






fis 




26 


232204 at 


early B-cell factor 


EBF 


27 


203965_at 


ubiquitin specific 


USP20 






protease 20 




28 


20489 l_s_at 


lymphocyte- 


LCK 






specific protein 








tyrosine kinase 




29 


205255_x_at 


transcription 


1 K^r 1 






factor 7 (T-cell 








specific, HMG- 








box) 




30 


207655_s_at 


B-cell linker 


BLNK 


31 


20977 l_x_at 


CD24 antigen 


CD24. 






(small cell lung 








carcinoma cluster 








4 antigen) 




32 


211796_s_at 


T cell receptor 


TRB 






beta locus 




33 


213792_s_at 


insulin receptor 


INSR 


34 


215193_x_at 


major 


HLA- 






histocompatibility 


DRB3 






complex, class II, 








DR beta 3 




35 


216379_x_at 


KIAA1919 


KIAA191 






protein 


9 


36 


219191_s_at 


bridging integrator BIN2 


37 


219563_at 


hypothetical 


FLJ21276 






protein FL J2 1 27 6 




38 


219724_s_at 


KIAA0748 gene 


KIAA074 






product 


8 


39 


221750_at 


3-hydroxy-3- 


HMGCS1 






methylglutaryl- 








Coenzyme A 








synthase 1 








(soluble) 




40 


226157_at 


cDNA FLJ39 131 


FLJ39131 






fis 




41 


226496_at 


hypothetical 


FLJ22611 






protein FLJ22611 




42 


266_s_at 


CD24 antigen 


CD24 






(small cell lung 








carcinoma cluster 








4 antigen) 





3ql3 


AJ240085.1 


100.0 


Above 


42.6 




X76775 


100.0 


Below 


11.9 


9pl3 


BF5 10692 


100.0 


Below 


3922.0 


5q34 
5 


BG435302 
W73890 


100.0 
100.0 


Below 
Below 


85.0 
7685.7 




AI377271 


100.0 


Above 


12.7 


5q34 
9q34.12- 
q34.13 
lp34.3 


AF208502.1 
NMJ306676.1 

NM 005356.1 


100.0 
91.3 

91.3 


Below 
Above 

Above 


7129.1 
9.0 

13.8 


5q31.1 


NMJ)03202.1 


91.3 


Above 


8.4 


10q23.2- 

q23.33 

6q21 


NMJH3314.1 
AA761181 


91.3 
91.3 


Below 
Below 


103.2 
40.1 


7q34 


AF043 179.1 


91.3 


Above 


20.7 


19pl3.3 r 

pl3.2 

6p21.3 


AA485908 
AJ297586.1 


91.3 
91.3 


Below 
Below 


8.0 
12.1 


6q22.1 


AK000 168.1 


91.3 


Below 


44.0 


12ql3 


NM_016293.1 


91.3 


Above 


271.0 


14q32.2 


NM_024633.1 


91.3 


Below 


5.8 


12ql2 


NMJ314796.1 


91.3 


Above 


ll.O 


5pl4-pl3 


BG035985 


91.3 


Above 


3.4 


3 


AI5 69747 


91.3 


Above 


A A 


9plLl 


BG291039 


91.3 


Below 


7.6 


6q21 


L33930 


91.3 


Below 


69.7 



159- 



BNSDOCID: <WO. 



03083 140A2_I_> 



WO 03/083140 



43 


39318_at 


T-cell 

leiikemia/lympho 
ma 1A 


TCL1A 


44 


204214_s_at 


RAB32, member 
RAS oncogene 
family 


RAB32 


45 


204777_s_at 


mal, T-cell 

differentiation 

protein 


MAL 


46 


204890_s_at 


lymphocyte- 
specific protein 
tyrosine kinase 


LCK 


47 


205049_s_at 


CD79A antigen 
(immunoglobulin- 
associated alpha) 


CD79A 


48 


205254_x_at 


transcription 
factor 7 (T-cell 
specific, HMG- 
box) 


TCF7 


49 


205504_at 


Bruton 

agammaglobuline 
mia tyrosine 
kinase 


BTK 


50 


210915_x_at 


T cell receptor 
beta locus 


TRB 


51 


211211_x_at 


SH2 domain 
protein 1A, 
Duncan's disease 
(lymphoproliferati 
ve syndrome) 


SH2D1A 


52 


213830_at 


T cell receptor 
delta locus 


TRD 


53 


216191_s_at 


T cell receptor 
delta locus 


TRD 


54 


217143__s_at 


T cell receptor 
delta locus 


TRD 


55 


219528_s_at 


B-cell 

CLL/lymphoma 
11B (zinc finger 
protein) 


BCL11B 


56 


22041S_at 


ubiquitin 
associated and 
SH3 domain 
containing, A 


UBASH3 
A 


57 


222895_s_at 


B-cell 

CLL/lymphoma 
11B (zinc finger 
protein) 


BCL11B 


58 


223553_s_at 


hypothetical 
protein FLJ22570 


FLJ22570 


59 


225090 at 


HRD1 protein 


HRD1 


60 


226459_at 


Homo sapiens 
gastric cancer- 
related protein 
GCYS-20 (gcys- 
20) mRNA, 
complete cds 




61 


228314 at 


cDNA FLJ37485 


FLJ37485 



PCT/US03/08486 



14q32.1 


X82240 


91.3 


Below 


DO 1 A 


6q24.3 


NM_006834.1 


90.6 


Above 


lz/.y 


2cen-ql3 


NM_002371.2 


90.6 


Above 


yo.o 


lp34.3 


U07236.1 


r\f\ f~ 

90.6 


Above 


lo.O 


19ql3.2 


NM_001783.1 


90.6 


Below 


11.4 


5q31.1 


AW027359 


90.6 


Above 


352.0 


Xq21.33- 
q22 


NMJ)00061.1 


90.6 


Below 


6.6 


7q34 


M15564.1 


90.6 


Above 


15.9 


Xq25- 
a26 


AF100542.1 


90.6 


Above 


1963.5 


14qll.2 


AW007751 


90.6 


Above 


7411.2 


14qll.2 


X72501.1 


90.6 


Above 


253.7 


14qll.2 


X06557.1 


90.6 


Above 


1 < 1 O 


14q3231 
-q32.32 


NM_022898.l 


90.6 


Above 


11.6 


21q22.3 


NM_0l896l.l 


90.6 


Above 


759.3 


14q32.31 
-q32.32 


AA918317 


90.6 


Above 


11.7 


5q35.3 


BC004564.1 


90.6 


Below 


6.1 


llql2 


AA844682 
AW575754 


90.6 
90.6 


Below 
Below 


3.6 
10.7 




BE877357 


90.6 


Below 


4.7 



160- 



BNSDOCID: <WO 030831 40A2J_> 



WO 03/083140 



PCT/US03/08486 



62 2013S4_s_at 



63 202540_s_at 

64 20319S_at 

65 203932_at 

66 20461 3_at 

67 205267_at 

68 208650_s_at 

69 20865 l_x_at 

70 209995_s_at 

71 210038_at 

72 211126_s_at 

73 220068_at 

74 226245_at 

75 202615__at 

76 22486 l_at 

77 201194_at 

78 201349 at 



79 202539 s at 



membrane 
component, 
chromosome 17, 
surface marker 2 
(ovarian 

carcinoma antigen 

CA125) 

3-hydroxy-3- 

methylglutaryl- 

Coenzyme A 

reductase 

cyclin-dependent 

kinase 9 (CDC2- 

related kinase) 

major 

histocompatibility 
complex, class II, 
DM beta 
phospholipase C, 
gamma 2 

(phosphatidylinosi 
tol-specific) 
POU domain, 
class 2, 

associating factor 
1 

CD24 antigen 
(small cell lung 
carcinoma cluster 
4 antigen) 
CD24 antigen 
(small cell lung 
carcinoma cluster 
4 antigen) 
T-cell 

leukemia/lympho 
ma 1A 

protein kinase C, 
theta 

cysteine and 
glycine-rich 
protein 2 

pre-B lymphocyte 

gene 3 

cDNA 

DKFZp451C132 
cDNA 

DKFZp686D0521 
cDNA FLJ3 1 057 
fis 

selenoprotein W, 
1 

solute carrier 
family 9 

(sodium/hydrogen 
exchanger), 
is o form 3 
regulatory factor 1 
3-hydroxy-3- 



M17S2 17q21.1 NM_005899.1 83.8 Above 3.3 



HMGCR 5ql3.3- 
ql4 



CDK9 9q34.1 



HLA- 6p21.3 
DMB 



NM_000S59.1 83.8 Above 4.4 

NMJHH261.1 83.8 Below 4.8 
NM 002118.1 83.8 Below 7.9 



PLCG2 


16q24.1 


NM_002661.1 


83.8 


Below 


3.9 


POU2AF1 


llq23.1 


NM_006235.1 


83.8 


Below 


11.2 




fin? 1 


RG327863 

.L> V 1 —J 1 O W_/ 


83.8 


Below 


74.7 


CD24 


6q21 


M58664.1 


83.8 


Below 


52.7 


TCL1A 


14q32.1 


BC003574.1 


83.8 


Below 


20166. 
2 


PRKCQ 


10pl5 


AL137145 


83.8 


Above 


12.7 


CSRP2 


12q21.1 


U46006.1 


83.8 


Below 


18.0 


VPREB3 


22qll.23 


NM_013378.1 


83.8 


Below 


6559.8 


DKFZp45 




U55984 


83.8 


Above 


8.7 


1C132 












DKFZp68 




BF222895 


82.2 


Above 


3.1 


6D0521 










3.5 


FLJ31057 




BF477658 


82.2 


Above 


SEPW1 


19ql3.3 


NM_003009.1 


82.0 


Above 


3.8 


SLC9A3R 
1 


17q25.2 


NM_004252.1 


82.0 


Above 


2.9 


HMGCR 


5ql3.3- 


AL5 18627 


82.0 


Above 


3.5 



-161- 



BNSDOCID: <WO_ 



_03083140A2_I_> 



WO 03/083140 

methylglutaryl- 
Coenzyme A 
reductase 

80 203588_s_at transcription TFDP2 

factor Dp-2 (E2F 
dimerization 
partner 2) 

81 204852_s_at protein tyrosine PTPN7 

phosphatase, non- 
receptor type 7 

82 207434_s_at FXYD domain FXYD2 

containing ion 
transport regulator 
2 

83 20S872_s_at DNA segment, D5S346 

single copy probe 

LNS-CAI/LNS- 

CAII 

84 209200_at MADS box MEF2C 

transcription 
enhancer factor 2, 
polypeptide C 
(myocyte 
enhancer factor 
2C) 

85 212795_at KIAA1033 KIAA103 

protein 3 

86 212827_at immunoglobulin IGHM 

heavy constant mu 

87 213193_x_at T cell receptor TRB 

beta locus 

88 221002_s_at tetraspanin similar DC- 

to TM4SF9 TM4F2 

89 225314_at hypothetical MGC4541 

protein 6 
MGC45416 

90 227432 s_at insulin receptor INSR 

91 203332_s_at inositol INPP5D 

polyphosphate-5- 

phosphatase, 

145kDa 

92 203589_s_at transcription TFDP2 

factor Dp-2 (E2F 
dimerization 
partner 2) 

93 205674_x_at FXYD domain FXYD2 

containing ion 
transport regulator 
2 

94 20988 l_s_at Linker for LAT 

activation of T 
cells 

95 211005_at Linker for LAT 

activation of T 
cells 

96 211075_s_at CD47 CD47 

97 211210_x_at SH2 domain SH2D1A 

protein 1A, 



PCT/US03/08486 

ql4 



3q23 


BG034328 


82.0 


Above 


17.5 


lq32.1 


NM_002832.1 


82.0 


Above 


9.5 


llq23 


NM 021603.1 


82.0 


Above 


14.6 


5q22-q23 


AA814140 


82.0 


Below 


2.6 




N22468 


82.0 


Below 


7.5 


12q24.11 


AL137753.1 


82.0 


Below 


2.4 


14q32.33 


X17115.1 


82.0 


Below 


13.1 


7q34 


AL559122 


82.0 


Above 


10.9 


10q23.2 


NM 030927.1 


82.0 


Below 


2.1 


4pl2 


BG291649 


82.0 


Above 


5.5 


pl3.2 
2q36-q37 


AI215106 
NM_005541.1 


82.0 
81.5 


Below 
Below 


6.0 
2.2 


3q23 


NM_006286.1 


81.5 


Above 


35.1 


llq23 


NM_00 1680.2 


81.5 


Above 


12.2 


16ql3 


AF036905.1 


SI. 5 


Above 




16ql3 


AF036906.1 


81.5 


Above 


67.8 


Xq25- 


Z25521.1 
AF100539.1 


81.5 
81.5 


Above 
Above 


2.1 
300.2 



q26 
162- 



BNSDOCID: <WO 030831 40A2J_> 



WO 03/083140 



PCT/US03/08486 



98 213601_at 

99 213857 s at 



100 214924_s_at 



Duncan's disease 
(lymphoproliferati 
ve syndrome) 
slit homolog 1 
(Drosophila) 
CD47 antigen 
(Rh-related 
antigen, integrin- 
associated signal 
transducer) 
KIAA1042 
protein 



SLIT1 



CD47 



10q23.3- 
q24 

3ql3.1- 
ql3.2 



AB011537.2 S1.5 
BG230614 81.5 



Above 1752.1 
Above 2.2 



KIAA104 3p25.3- 
2 p24.1 



AK000754.1 81.5 Below 2.3 



Table 68. Top 100 chi-square probe sets selected for TEL-AML1 



U133 probe 
set 



Gene 

Description 



Chromo- 
somal 







TEL- 








AML 






Chi- 


above/ 






square 


below 


Fold 


GenBank Ref 


value 


mean 


change 


W80418 


75 


Above 


7.6 


AK022784.1 


75 


Above 


2446.3 


AI452798 


75 


Above 


23.7 


BF5 13468 


75 


Above 


13.4 


NM_001999.2 


69.1 


Above 


14.4 


NM_0 15320.1 


69.1 


Above 


148.1 


BC001 304.1 


69.1 


Above 


101.2 


AB011131.1 


69.1 


Above 


77.5 


NM_022161.1 


69.1 


Above 


25.4 


W80418 


69.1 


Above 


4.3 


N49233 


69.1 


Above 


9.3 



1 224722 at KIAA1323 



2 
3 
4 
5 



7 
8 
9 



227377_at 
237206_at 
241505_at 
203184 at 



205109 s at 



210650_s_at 
213558_at 
220451 s at 



10 224720_at 

11 235694_at 

12 202808_at 

13 206032_at 

14 206033_s_at 

15 209228_x_at 

16 224725_at 

17 203910_at 

18 204849 at 



FLJ12722 

EST 

EST 

Fibrillin 2 

(congenital 

contractural 

arachnodactyly) 

Rho guanine 

nucleotide 

exchange factor 

(GEF) 4 

Piccolo 

Piccolo 

Livin IAP 

(inhibitor of 

apoptosis) 

KIAA1323 



KIAA132 
3 



FBN2 



ARHGEF 
4 

PCLO 
PCLO 
BIRC7 



KIAA132 
3 



17pl2 

5q23.2 

2q22 



IMAGE:4661943 
Unknown EST 
Hypothetical 
protein FLJ20154 
Desmocollin 3 
Desmocollin 3 
Putative prostate 
cancer tumor i 
suppressor gene 
N33 

KIAA1323 



PTPL1 -associated 

RhoGAP 

Transcription 



FLJ20154 AK00016L1 68.9 
10q24.32 

DSC3 18ql2.1 AI797281 68.9 

DSC3 18ql2.1 NM_001941.2 68.9 

N33 8p22 U42349.1 68.9 



Above 

Above 
Above 
Above 



lSqll.l W80418 

KIAA132 
3 

PARG1 lp22.1 



68.9 



NM 004815.1 64 



3.7 

54.1 
357.1 
20.8 
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JZ? 


991747 at 


Tensin 


TNS 


2q35 


AL046979 


57.1 


Above 


49.2 


40 


224726_at 


KIAA1323 




lSqll.l 


W80418 


57.1 


Above 


26.1 








KIAA132 














9^14^5 at 

£*J ltJJ 


ESTs 


J 


2p25.2 


A A T/CCCQQ 

AA /Ooooo 


57.1 


Above 


7.7 


42 


232750_at 


Homo sapiens 


rL>J Id / DU 


Z(\jj 




57.1 


Above 


35.0 






cDNA FLJ 13750 










Above 


1.9 


43 


209685„s_at 


Protein kinase C, 


T>T» T." /^"D 1 

PRKCtU 


lopi 1.Z 


1VJLJL jy / j.i 


53.6 






beta 1 










Above 


2.0 


AA 


904404 at 


cot t:i,a 

Jio l luce 


<3T PI 9 A9 


5q23.3 


NM 001046.1 


53.4 






Na+/K*r/Cl- 


















transporter with 


















AA permease 


















domain, memb 2 
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46 
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47 
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5.4 


48 
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MTV! 00044 S 1 
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activating gene 1 
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49 
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53 
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54 
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55 


211S91_s_at 
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zqzz 
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exchange factor 


A 
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2.0 


56 
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1 /pi j.j 


AT Jl S 1 801 


51.6 
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1.7 


57 
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Biologic insights from the new class defining genes 

Interestingly, the overall quantitative pattern of expression of discriminating 
genes varied significantly between leukemia subtypes (Table 69). Within the B-cell 
lineage leukemia subtypes, E2A-PBX1, TEL-AML1, BCR-ABL, and Hyperdiploid 
>50 chromosomes were characterized primarily by genes that were overexpressed, 
where as almost 40% of the discriminating genes that characterized MLL fusion gene 
expressing leukemias were underexpressed. More remarkably, the discriminating 
genes for the leukemia subtypes defined by chimeric transcription factors were 
markedly overexpressed, with an average fold increase of 1 12 and 48 for E2A-PBX1 
and TEL-AML1, respectively. By contrast, the discriminating genes for BCR-ABL 
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and MLL fusion gene expressing leukemias showed an average fold increases of only 
6.8. and 8.6, respectively, whereas the discriminating genes for hyperdiploid >50 
chromosomes had an average fold-increase of only 2.6 fold. These data suggest that 
the quantitative global changes in a cell's expression profile vary markedly depending 
5 on the genetic lesion(s) that underlie the initiation of the leukemic process. 



Table 69. Summary of fold change by diagnostic 
subgroup (by gene) 





Mean fold 




Subgroup 


change 


Range 


BCR-ABL 


6.8 


1.1-90.5 


E2A-PBX1 


112.0 


1.6-5435 


Hyperdiploid >50 


2.6 


1.3-27.2 


MLL rearrangement 


S.6 


1.0-75 


T-ALL 


387 


2.1 -7685 


TEL-AML1 


48:3 


1.5-2446 



10 

Tables 70-74 show genes whose expression is limited to a single B-cell 
lineage class, and therefore function not only as class discriminators in the decision 
tree format, but are also class discriminators in a parallel format in which a class is 
distinguished against all others. Thus, these genes have the potential of serving as 

15 unique class specific diagnostic or therapeutic targets. In addition, these genes may 
provide unique insights into the underlying biology of the different leukemia 
subtypes. For example, BCR-ABL expressing ALLs are characterized by the over 
expression of Dynactin 4, which encodes a RING finger containing protein that is part 
of the 20S dynactin multisubunit complex involved in movement, intracellular 

20 transport and division through its interaction with the cytoplasmic microtubule-based 
motor dynein; PSTPIP2, which encodes a proline/serine/threonine phosphatase- 
interacting protein that is also involved in controlling the organization of the 
cytoskeleton, and is tyrosine phosphorylated following activation of receptor tyrosine 
kinases (Karki et al. (2000) J. Biol Chem. 275:4834-4839); and several novel ESTs. 

25 
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Table 70: Genes highly Correlated with BCR-ABL 


GenBank Reference 


Gene Description 


AK002064 


DKrZP5 64 A241o nistone xlj signature 


rSJti/loUZo 




NM_024600 


FLJ20898 


NM_024430 


Pro-Ser-Thr phsphatase interac. protein 2 


AV648669 


FLJ39877 



E2A-PBX1 expressing leukemias are characterized by the expression of 
PBX1, the receptor tyrosine kinase gene C-MERTK, and the FAT tumor suppressor, 
which encodes a member of the cadherin repeat domain containing family of 
5 transmembrane proteins (see Table 64). Among the discriminating genes were two 
genes, EB-1 and Wntl6 that had previously been shown to be over expressed in this 
leukemia subtype (Wu et al. (199S) J. Biol. Chem. 273:30487-30496; and Fu et al. 
(1999) Oncogene 18:4920-4929). In addition, the retinal degeneration B beta gene 
(McWhirter et al. (1999) Proc. Natl. Acad. Sci. U S A. 96: 1 1464-1 1469), and a 
1 0 number of novel ESTs were identified as being uniquely over expressed in this 

leukemia subtype, whereas the SOCS2 negative regulators of cytokine signaling was 
found to be under expressed (Fullwood and Hsuan (1999) J Biol. Chem. 274:31553- 
31558). 26 



Table 71: Genes highly Correlated with E2A-PBX1 


GenBank Reference 


Gene Description 


NM_012417 


retinal degeneration B beta 


AI971602 


MGC10485 


AW005572 


EB-1 


AL357503 


Q9H4T4 like 


NM_016087 


Wntl6 



Hyperdiploid leukemias with >50 chromosomes were characterized by the 
over expression of MST4, which encodes a novel serine/tlireonine kinase (Horvat and 
Medrano (2001) Genomics 72:209-212); SH3BP2, which encodes a SH3-domain 
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containing binding protein (Lin et al. (2001) Oncogene 20:6559-6569) histone 
deacetylase 6, which encodes a protein involved in transcriptional repression; the 
. retinoblastoma binding protein 7 gene, which encodes a protein found in many 
functional histone deacetylase complexes (Bell et al. (1997) Genomics 44:163-170), 
and TNRC1 1 a trinucleotide repeat containing gene that is also known as HOPA or 
TRAP230 and is part of the thyroid hormone receptor-associated protein (TRAP) 
complex (Huang et al. (1991) Nature 350:160-162; and Ito et al. (1999) Mol Cell. 
3:361-370. 



10 



15 



20 



25 



Table 72: Genes highly Correlated with Hyperdiploid >50 


GenBank Reference 


Gene Description 


NM_002893 


Retinoblastoma binding protein 7 


AB000462 


SH3-domain binding protein 2 


NMJ)06044 


Histone deacetylase 6 


BC004354 


trinucleotide repeat containing 1 1 


NM_016542 


Mst3 and SOK1 -related kinase 



Cases with MLL gene rearrangements were characterized by the over 
expression of HOXA9 and Meisl (see Table 66). Included in the up-regulated genes 
was a novel transcript from chromosome 20 that was over expressed almost 25 fold. 
This transcript is predicted to encode a protein of 2S0 amino acids that shows a low 
level of homology to a lysosome-associated membrane glycoprotein (LAMP). Also 
specifically over expressed in this leukemia subtype is a gene encoding an insulin 
growth factor (IGF) II RNA binding protein, that has been shown to repress the 
translation of the IGF-H growth factor (Armstrong et al. (2002). Nat. Genet. 30:41- 
47). Among the down regulated genes was neuron navigator 1 (Nielsen et al. (1999) 
Mol Cell Biol. 19:1262-1270), which encodes an 1874 amino acid protein and is 
involved in direction guidance of migratory cells, and a member of the TCF/LEF 
family of transcription factors, TCF-4. TCF-4 functions downstream of p-catenin in 
the Wnt-mediated signaling cascade and has been shown to be essential for the 
maintenance of intestinal crypt stem cells (Maes et al. (2002) Genomics 80:21-30). 
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Table 73: Genes highly Correlated with MIX 


GenBank Reference 


Gene Description 


NM_0 12261 


C20orfl03 


AI202327 


FLJ37247 


NM_006548 


lOr-ll lilJtvLN Dintuiig proieiii *~ 


NM_018401 


gene for serine/threonin protein kinase 


NM_0 18728 


myosin 5C 


AB032977 


neuron navigator 1 



Genes that were discriminators of TEL-AML1 leukemias included a gene 
localized to chromosome lSqll.l that encodes a 795 amino acid protein that has 8 
ankyrin repeat domains and a C-terminal RJNG finger domain. This combination of 
5 domains is identified in only a limited number of mammalian proteins, most notably 
BARD1, a regulator of the BRCA1 tumor suppressor (Korinek et ah (1998) Nat 
Genet.l9:379-3S3). Other genes overexpressed in the subtype include desmocollin 
(Irminger-Finger and Leung (2002) Int. J. Biochem. Cell Biol 34:582-587), FLJ12722 
a novel protein of unknown function, and a member of the IAP family of apoptosis 
10 inhibitors, BIRC7, which is overexpressed 25 fold (Whittock et ah (2000) Biochem 
Biophys Res Commun. 276:454-460). 



Table 74: Genes highly Correlated with TEL-AML1 


GenBank Reference 


Gene Description 


W80418 


KIAA1323 


AK022784 


FLJ12722 


NM_022161 


BIRC7 


AI452798 


FLJ39434 


AI797281 


Desmocollin 3 
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Expression profiling accurately identifies the prognostic subtypes of ALL 

To assess the accuracy of identifying prognostically important ALL genetic 
subtypes by expression profiling, the class discriminating genes identified using a chi- 
squared metric were used in an ANN-based supervised learning algorithm. Class 
5 assignment utilized the decision tree differential diagnostic format described 
elsewhere herein, and required that the node value for assignment exceeded a 
statistically defined confidence level. Using this approach resulted in exceptionally 
accurate class prediction in a randomly selected training set that consisted of three- 
fourths of the total cases (100 cases). When this classification model was then applied 

10 to a blinded test set consisting of the remaining 32 samples, an overall accuracy of 
97% was achieved for class assignment. To control for over-fitting of the data, 10 
additional rounds of this analysis were performined in which for each round new 
training and test sets were developed, genes reselected using the new training set, and 
then their performance assessed on the new test set. This resulted in an average 

1 5 accuracy of class assignment in the blinded test sets of 97.2%, with a range from 

93. S% to 100%. Although the number of genes required for optimal class assignment 
varied between classes, the best overall diagnostic accuracy was achieved using the 
top 50 genes per class. A similar level of accuracy was achieved using a variety of 
other supervised learning algorithms, including k-NN and SVM. 

20 Interestingly, of the rare misclassification errors, two were cases of BCR-ABL 

expressing ALL that by gene expression analysis was classified as hyperdiploid >50 
chromosomes. The karyotype of these cases showed the presence of both the 
Philadelphia chromosome and a hyperdiploid karyotype consisting of >50 
chromosomes - including trisomy of chromosomes X and 21 (data not shown). The 

25 expression profile thus correctly identified the presence of the hyperdiploid >50 

chromosomes class; however, since each case is assigned to only a single class, the 
algorithm failed to correctly identify the presence of BCR-ABL. Nevertheless, the 
data presented demonstrates the exceptional accuracy of this single platform for the 
diagnosis of the prognostically important subtypes of ALL. 

30 
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Overview of Experimental Procedure 

A. Gene expression profiling 

The preparation of mononuclear cell suspensions from diagnostic bone 
marrow aspirates, extraction of total RNA, and preparation of hybridization solutions 

5 was performed as described for Example 1 . Individual hybridization solutions from 
our previous study had been stored at -80°C since initial hybridization (approximately 
1 year). These solutions were thawed and hybridized to Affymetrix® HG-U133A and 
HG-U133B oligonucleotide microarrays (Affymetrix Inc., Santa Clara, CA) according 
to Affymetrix protocols. In two cases where the original hybridization solutions were 

1 0 no longer available, replicate viably frozen mononuclear cell preparations from the 
diagnostic bone marrow aspirate were obtained, RNA isolated, cDNA and cRNA 
synthesized, labeled, fragmented and hybridized as described for Example 1. 

After sample hybridization, arrays were then stained with phycoerythrin- 
conjugated streptavidin (Molecular Probes, Eugene, OR). Antibody amplification was 

1 5 performed with biotinylated anti-streptavidin (Vector Laboratories, Burlingame, CA), 
followed by staining with phycoerythrin-conjugated streptavidin (Molecular Probes). 
Arrays were scanned using a laser confocal scanner (Agilent, Palo Alto, CA) and then 
analyzed with Affymetrix® Microarray suite 5.0 (MAS 5.0). Detection values 
(present, marginal or absent) were determined by default parameters, and signal 

20 values were scaled by global methods to a target value of 500. Microarray scan 

images were visually inspected for apparent defects, and Affymetrix internal controls 
were utilized to monitor the success of hybridization, washing, and staining 
procedures. Minimal quality control parameters for inclusion in the study included 
greater than 10% present calls and a GAPDH 3 75' ratio of < 3. The arrays included in 

25 this study had an average % present call of 35.9% for the A chip and 21.0% for the B 
chip (combined average of 28.5%). 

B. Statistical Analysis 

The dataset was separated into a train set (100) and test set (32). The 
30 identification of subtype discriminating genes was performed using the training set. 
Moreover, both gene discovery and subsequent class predictions were performed 
using a differential diagnosis decision tree format. In this format, classification was 
performed in a sequential order starting with T-ALL and proceeding in order E2A- 
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PBX1, TEL-AML1, BCR-ABL, MLL rearrangement, and Hyperdiploid >50 
cliromosomes. Unassigned cases were classified as other. Samples classified into the 
class under diagnosis were removed prior to proceeding to the next level in the 
decision tree. In addition, prior to analysis a variation filter was applied to remove any 
5 probe set that showed minimal variation across the dataset, and thus contributed 
minimally, if at all, to the discrimination of leukemia subtypes. Specifically, probe 
sets were eliminated from further analysis if the number of cases with a present call 
was less than Vz the number of samples comprising the leukemia subgroup under 
analysis, had a signal value < 100 in all samples in the dataset, or had a maximal 

10 signal value in the dataset - minimal signal value in the dataset that was less than 100. 
In addition, all signal values with absent or marginal calls were reset to 1, while probe 
sets with a present "P" call and a signal <100 had the signal reset to 100. The values 
for signals from the Affymetrix® control sets were removed prior to analysis. 

Unsupervised hierarchical clustering and principal component analysis (PCA) 

1 5 were performed using GeneMaths software (version 1.5, Applied Maths, Belgium). 
Data reduction to define the genes most useful in class distinction was primarily 
perfonned using a chi-square metric. In this procedure, an entropy-based 
discretization method was first applied to identify genes whose expression across the 
dataset showed differentiation between class and non-class. 17 The assigned 

20 descretized value for the gene was then used in a chi-square calculation to determine 
if the association with a class was more than would be expected by random chance. 
The stronger the association with the class, the larger the chi-square value calculated. 
For the genes that couldn't be discretized, their chi-squared values were set to zero. 
To evaluate the statistical significance of the discriminating genes, we used a 

25 permutation test in which for each class, case labels were randomly reassigned to 

generate new groups of identical size. The label permutated data was discretized again 
and the chi-square values were recalculated. The permutation test was repeated for a 
total of 1000 times. The true chi-square values for each probe set were then compared 
to the values generated from the 1000 permutations to determine how many times a 

30 chi-square value for a probe set in a randomly labeled group was greater than that 
obtained for the true class distinction. A p value was calculated as the number of 
times the chi-square value exceeded the true value in the 1000 permutations. 

The discriminating genes selected were then used in supervised learning 
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algorithms to build classifiers that could identify the specific genetic subgroup. 
Algorithms used included k-Nearest Neighbors (k-NN), Support Vector Machine 
(SVM), and an artificial neural network (ANN). See, Example 1, Witten and Frank 
(1999) Data mining: Practical machine learning tools and techniques with Java 
5 implementation. Morgan Kaufinan; Piatt (1998) Fast training of support vector 
machines using sequential minimal optimization in Advances in kernel methods - 
support vector learning Schlkopf B, Burges C, and Smola A, eds. MIT Press; and 
Cover and Hart (1967) IEEE Transactions on Information Theory 13:21-27. 
Performance of each model was initially assessed by three-fold cross validation on a 
1 0 randomly selected stratified training set. True error rates of the best performing 

classifiers were then determined using the remaining one-fourth of the samples as a 
blinded test group. Class assignment required that a sample's calculated node value 
exceed a statistically determined confidence level in order for it to be assigned to a 
class. Details of the supervised learning algorithms and their use are described below. 

15 

Detailed Experimental Procedures 

A. Patient Dataset 

132 cases of pediatric ALL were selected from the original 327 diagnostic 
20 bone marrow aspirates described in Example 1 to reanalyze on the higher density 
U133A and B microarrays. The selection of cases was based on having sufficient 
numbers of each subtype to build accurate class predictions, rather than reflecting the 
actual frequency of these groups in the pediatric population. 

25 B. Hybridization of microarrays 

The hybridization solutions according to Example 1 were thawed at 45°C, then 
microcentrifuged for 5 minutes to remove any insoluble material from the mixture. 
The hybridization solutions were added to U133A chips and allowed to hybridize for 
16 hours at 45°C. At the end of the incubation period, the hybridization solution was 
30 removed from each Ul 33 A chip and refrozen. Subsequently, the hybridizations were 
thawed and hybridized to the U133B chip. 

A non-stringent wash buffer (6X SSPE, 0.01% Tween 20) was added to each 
chip cassette after the hybridization solution was removed and the cassette allowed to 
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equilibrate to room temperature. The microarray cassettes were then placed on the 
fluidics station and the antibody amplification protocol performed. The arrays were 
washed at 25°C with the non-stringent buffer followed by a more stringent wash at 
50°C with 100 mM MES, 0.1M NaCl 2 > 0.01% Tween 20. The arrays were then 
5 stained with Streptavidin Phycoerythrin (S APE, Molecular Probes, Eugene, OR) for 
10 minutes at 25°C. Following another non-stringent wash, the arrays were 
hybridized for 10 minutes at 25°C with an antibody solution (100 mM MES, 1 M 
[Na + ], 0.05% Tween 20, 2 mg/ml BSA, 0.1 mg/ml goat IgG, and 3 Dg/ml biotinylated 
antibody). This solution was removed and the cassettes restained with the SAPE 
10 solution. 

Arrays were scanned on a laser confocal scanner (Agilent, Palo Alto, CA) and 
then analyzed with Affymetrix® Microarray Suite 5.0 (MAS 5.0). Detection values 
(present, marginal or absent) were determined by default parameters, and signal 
values were scaled by global methods to a target value of 500. After completing the 
1 5 scans, the arrays were visually inspected for defects and Affymetrix internal controls 
were utilized to monitor the success of hybridization, washing, and staining 
procedures. 

C. Statistical methods 

20 The chi-square metric and the kNN and ANN supervised learning algorithms 

were performed as described for Example 1. The SVM supervised learning algorithm 
that was used in this study is available as part of the software package Rv 1.6.0. See, 
Ribeiro, and Brown. TJie ISBA Bulletin, S(l):12-16, and www.r-project.org. 

To determine the performance of each model using ANN, a confidence 

25 threshold was built for each diagnostic subtype utilizing a modification of the method 
described by Khan et al. (2001) Nat Med. 7:673-679. Models were built based on a 
decision tree fomiat where each level of the decision tree contains only two possible 
distinctions - class and non-class (for example, T verses non-T). At each level, using 
only samples in the training set, 3 ANN models were built by 3-fold cross validation. 

30 The training set samples were then shuffled and 3 additional ANN models were built. 
This model building process was repeated for a total of 100 times at each step of the 
decision tree. Then an empirical probability distribution for the ANN output node 
value was built only for subtype under study, for example, T-ALL at the first step of 
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the decision tree. Only nodal values greater than 0.5 for each subtype were included. 
For each individual sample in the training set, the 100 validation subtype node values 
were averaged and compared to threshold. Individual samples were assigned to the 
subtype under study only when its average subtype nodal value was greater than the 
5 95% confidence threshold. For samples in the test set, subtype nodal values are 
averaged from all models generated in the 3-fold cross validation. A sample is 
assigned to the class under study when the average subtype nodal value is greater than 
the 95% confidence level defined on the training set. A sample not assigned to the 
subtype will progress to the next level of the decision tree, where the entire process is 
10 repeate 



All publications and patent applications mentioned in the specification are 
indicative of the level of those skilled in the art to which this invention pertains. All 
15 publications and patent applications are herein incorporated by reference to the same 
extent as if each individual publication or patent application was specifically and 
individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
20 illustration and example for purposes of clarity of understanding, it will be obvious 
that certain changes and modifications may be practiced within the scope of the 
appended claims. 
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THAT WHICH IS CLAIMED: 

1. A method of assigning a subject affected by leukemia to a leukemia 

risk group, said method comprising: 

a) providing a subject expression profile of a sample from said 

subject affected by leukemia; 

b) providing a plurality of reference expression profiles, each 
associated with a leukemia risk group selected from the group consisting of T- ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression level of a gene having differential expression 
in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby assign said subject affected by leukemia to a 
leukemia risk group. 

2. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the T-ALL risk group comprise values 
selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 7; 

b) a value representing the expression level of the gene shown in 

Table 14; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 21; 

d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 28; 

e) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 35; 

f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 59; and 

g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 67. 
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3. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the E2A-PBX1 risk group comprise 
values selected from the group consisting of: 
5 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 3; 

b) a value representing the expression level of the gene shown in 

Table 10; 

c) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 17; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 24; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 31; 

15 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 64; and 

h) values representing the expression levels of at least one of the 

20 genes shown in Table 7 1 . 

4. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the TEL-AML1 risk group comprise 
values selected from the group consisting of: 
25 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 8; 

b) values representing the expression levels of the genes shown in 

Table 15; 

c) values representing the expression levels of at least 20 genes 
30 selected from the genes shown in Table 22; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 29; 
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e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 36; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 55 ; 

5 g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 68; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 74. 

10 5. The method of claim 1 wherein the subject expression profile and the 

reference expression profile associated with the BCR-ABL risk group comprise 
values selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 2; 

15 b) values representing the expression levels of the genes shown in 

Table 9; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 16; 

d) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 23; 

e) values representing the expression levels of at least 20 gene 
selected from the genes shown in Table 30; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 54; 

25 g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 63; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 70. 

30 6. The method of claim 1 wherein the subject expression profile and the 

reference expression profile associated with the MLL risk group comprise values 
selected from the group consisting of: 
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a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 5; 

b) values representing the expression levels of the genes shown in 

Table 12; 

5 c) values representing the expression level of at least 20 genes 

selected from the genes shown in Table 19; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 26; 

e) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 33; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 57; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 66; and 

1 5 h) values representing the expression levels of at least one of the 

genes shown in Table 73. 

7. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Hyperdiploid >50 risk group 
20 comprise values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 4; 

b) values representing the expression levels of the genes shown in 

Table 11; 

25 c) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 18; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 25; 

e) values representing the expression levels of at least 20 genes 
30 selected from the genes shown in Table 32; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 56; 
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g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 65; and 

h) values representing the expression levels of at least one of the 

genes shown in Table 72. 

5 

8. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Novel risk group comprise values 
selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 

10 selected from the genes shown in Table 6; 

b) values representing the expression level of the genes shown in 

Table 13; 

c) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 20; 
15 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 27; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 34; and 

f) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 58. 

9. The method of claim 1, wherein said sample from said subject affected 
by ALL comprises leukemic blasts. 

25 10. The method of claim 9, wherein said sample from said subject affected 

by ALL comprises at least 35 % leukemic blasts. 

1 1 . The method of claim 1 0, wherein said sample from said subj ect 
affected by ALL comprises at least 75% leukemic blasts. 



30 



12. The method of claim 9 wherein said sample comprises leukemic blasts 
derived from peripheral blood. 
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1 3 . The method of claim 9 wherein said sample comprises blast cells 
derived from bone marrow. 

14. A method of predicting whether a subj ect affected by leukemia has an 
5 increased risk of relapse, said method comprising the steps of: 

a) assigning the subject affected by leukemia to a leukemia risk 
group selected from the group consisting of T- ALL, Hyperdiploid >50, TEL-AML1, 
MLL, E2A-PBX1, BCR-ABL, and Novel; 

b) providing a subject expression profile of a sample from said 
10 subject affected by leukemia; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 
leukemia is assigned, wherein the subject expression profile and the reference 
expression profile comprise one or more values representing the expression level of a 

1 5 gene having differential expression in subj ects affected by leukemia who will relapse 
after conventional therapy; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 
leukemia risk group to which the subject affected by leukemia is assigned to thereby 

20 determine whether the subj ect affected by leukemia has an increased risk of relapse. 

1 5 . The method of claim 14, wherein the step of assigning the subject 
affected by leukemia to a leukemia risk group is performed according to the method 
of claim 1 . 

25 

16. The method of claim 14, wherein said subject affected by leukemia is 
assigned to the T-ALL risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 8 genes selected from the genes shown in Table 44. 

30 

1 7. The method of claim 14, wherein said subj ect affected by leukemia is 
assigned to the Hyperdiploid >50 risk group and said subject expression profile and 
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said reference expression profile comprise values representing the expression levels of 
at least 5 genes selected from the genes shown in Table 45. 

18. The method of claim 14, wherein said subject affected by leukemia is 
5 assigned to the TEL-AML1 risk group and said subject expression profile and said 

reference expression profile comprise values representing the expression levels of at 
least 3 genes selected from the genes shown in Table 46. 

1 9. The method of claim 14, wherein said subject affected by leukemia is 
10 assigned to the MLL risk group and said subject expression profile and said reference 

expression profile comprise values representing the expression levels of at least 5 
genes selected from the genes shown in Table 47. 

20. The method of claim 14, wherein said subject affected by leukemia is 
15 not assigned to the T-ALL, Hyperdiploid>50, TEL-AML1, MLL, E2A-PBX1, or 

BCR-ABL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 4 
genes selected from the genes shown hi Table 48. 

20 2 1 . A method of predicting whether a subj ect affected by TEL- AML1 has 

an increased risk of developing secondary AML, said method comprising: 

a) providing a subject expression profile of a sample from said 

subject affected by TEL-AML1; 

b) providing a reference expression profile associated with the 
25 occurrence of secondary AML in subjects affected by TEL- AML 1 wherein the 

subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
subjects affected by TEL- AML 1 who will develop secondary AML; and 

c) determining whether the subject expression profile shares 

30 sufficient similarity to the reference expression profile associated with the occurrence 
of secondary AML to thereby determine whether the subject affected by TEL- AML 1 
has an increased risk of developing secondary AML. 
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22. A method of choosing a therapy for a subject affected by leukemia, 
said method comprising: 

a) providing a subject expression profile of a sample from said 

subj ect affected by leukemia; 
5 b) providing a plurality of reference expression profiles, each 

associated with a leukemia risk group selected from the group consisting of T-ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MIX, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression of level of a gene having differential 
10 expression in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby choose a therapy for the subject affected by 
leukemia. 

15 23 . A method of choosing a therapy for a subj ect affected by leukemia, 

said method comprising the steps of: 

a) assigning the subject affected by leukemia to a leukemia risk 

group selected from the group consisting of T-ALL, Hyperdiploid >50, TEL-AML1, 

MLL, E2 A-PBX1 , BCR-ABL, and Novel; 
20 b) providing a subject expression profile of a sample from said 

subject affected by ALL; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 

leukemia is assigned, wherein the subject expression profile and the reference 
25 expression profile comprise one or more values representing the expression level of a 
gene having differential expression in subjects who will relapse after conventional 
therapy; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 

30 leukemia risk group to which the subject affected by ALL is assigned to thereby chose 
a therapy for said subject affected by ALL. 
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24. The method of claim 23, wherein the step of assigning the subject 
affected by leukemia to a leukemia risk group is performed according to the method 
of claim 1. 

25. The method of claim 23, wherein said subject affected by leukemia is 
assigned to the T-ALL risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 8 genes selected from the genes shown in Table 44. 

26. The method of claim 23, wherein said subject affected by leukemia is 
assigned to the Hyperdiploid >50 risk group and said subject expression profile and 
said reference expression profile comprise values representing the expression levels of 
at least 5 genes selected from the genes shown in Table 45. 

27. The method of claim 23, wherein said subject affected by leukemia is 
assigned to the TEL-AML1 risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 3 genes selected from the genes shown in Table 46. 

28. The method of claim 23, wherein said subject affected by leukemia is 
assigned to the MLL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 5 
genes selected from the genes shown in Table 47. 

29. The method of claim 23, wherein said subject affected by leukemia is 
not assigned to the T-ALL, hyperdiploid >50, TEL-AML1, MLL, E2A-PBX1, or 
BCR-ABL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 4 
genes selected from the genes shown in Table 48. 

30. A method of choosing a therapy for a subject affected by TEL-AML1, 
said method comprising: 
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a) providing a subject expression profile of a sample from said 
subject affected by TEL-AML1 ; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL- AML 1 wherein the 

5 subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
subjects affected by TEL- AML 1 who will develop secondary AML; and 

c) determining, whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 

10 of secondary AML to thereby chose a therapy for the subject affected by TEL- AML 1 . 

3 1 . The method of claim 30, wherein said subject expression profile and 
said reference expression profile comprise values representing the expression levels of 
at least 7 genes selected from the genes shown in Table 48. 

15 

32. A method to aid in the detemiination of a prognosis for a subject 
affected by leukemia, said method comprising: 

a) providing a subject expression profile of a sample from said 
subject affected by leukemia; 

20 b) providing a plurality of reference expression profiles, each 

associated with a leukemia risk group selected from the group consisting of T- ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression of level of a gene having differential 

25 expression in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby determine the prognosis for the subject affected 
by leukemia. 

30 33. A method to aid in the determination of the prognosis for a subject 

affected by leukemia, said method comprising the steps of: 
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a) assigning the subject affected by leukemia to a leukemia risk 
group selected from the group consisting of T- ALL, Hyperdiploid >50, TEL-AML1, 
MLL, E2A-PBX1, BCR-ABL, or Novel risk group; 

b) providing a subject expression profile of a sample from said 
subject affected by leukemia; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 
leukemia is assigned, wherein the subject expression profile and the reference 
expression profile comprise one or more values representing the expression level of a 
gene having differential expression in subjects who will relapse after conventional 
therapy ; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 
Leukemia risk group to which the subject affected by leukemia is assigned to thereby 
detemiine the prognosis for the subject affected by leukemia. 

34. A method to aid in the determination of the prognosis for a subject 
affected by TEL-AML1, said method comprising: 

a) providing a subject expression profile of a sample from said 
subject affected by TEL-AML1; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL-AML1 wherein the 
subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
subjects affected by TEL-AML1 who will develop secondary AML after conventional 
therapy; and 

c) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 
of secondary AML to thereby determine the prognosis for the subject affected by 
TEL-AML1. 
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35. A method of assigning a subject affected by ALL to an ALL risk group 
selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR-ABL, 
MLL, Hyperdiploid >50, and Novel, said method comprising: 

a) providing a subject expression profile of a sample from said 
5 affected by ALL; 

b) providing a reference expression profile associated with the T- 
ALL risk group wherein the subject expression profile and the reference expression 
profile comprises one or more values representing the expression level of a gene 
having differential expression in the T-ALL risk group; 

10 c) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the T-ALL risk group to thereby determine whether the subject affected by ALL is in 
the T-ALL risk group; 

15 d) if the subj ect affected by ALL is not in the T-ALL risk group, 

providing a reference expression profile associated with the E2A-PBX1 risk group 
wherein the subject expression profile and the reference expression profile comprises 
one or more values representing the expression level of a gene having differential 
expression in the E2A-PBX1 risk group; 

20 e) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the E2A-PBX1 risk group to thereby determine whether the subject affected by ALL 
is in the E2A-PBX1 risk group; 

f) if the subject affected by ALL is not in the E2A-PBX risk 
25 group, providing a reference expression profile associated with the TEL-AML1 risk 

group wherein the subject expression profile and each reference expression profile 
comprises one ore more valued representing the expression level of a gene having 
differential expression in the TEL-AML1 risk group; 

g) determining whether the subject expression profile shares 
30 statistically significant similarity to the reference expression profile associated with 

the TEL-AML1 risk group to thereby determine whether the subject affected by ALL 
is in the TEL-AML1 risk group; 
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h) if the subject affected by ALL is not in the Tel-AMLl risk 
group, providing a reference expression profile associated with the BCR-ABL risk 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 

5 differential expression in the BCR-ABL risk group; 

i) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the BCR-ABL risk group to thereby determine whether the subject affected by ALL is 
in the BCR-ABL risk group; 

10 j) if the subject affected by ALL is not in the BCR-ABL risk 

group, providing a reference expression profile associated with the MLL risk group 
wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the MLL risk group; 

15 k) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the MLL risk group to thereby determine whether the subject affected by ALL is in 
the MLL risk group; 

1) if the subj ect affected by ALL is not in the MLL risk group, 

20 providing a reference expression profile associated with the Hyperdiploid >50 risk 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the Hyperdiploid >50 risk group; 

m) determining whether the subject expression profile shares 

25 statistically significant similarity to the reference expression profile associated with 
the Hyperdiploid 50 risk group to thereby determine whether the subject affected by 
ALL is in the Hyperdiploid >50 risk group; 

n) if the subject affected by ALL is not in the Hyperdiploid >50 
risk group, providing a reference expression profile associated with the Novel risk 

30 group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the Novel risk group; and 
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o) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the Novel risk group to thereby determine whether the subject affected by ALL is in 
the Novel risk group. 

5 

36. An array for use in a method of assigining a subject affected by 
leukemia to a leukemia risk group comprising a substrate having a plurality of 
addresses, wherein each address has disposed thereon a capture probe that can 
specifically bind a nucleic acid molecule selected from the group consisting of: 

10 a) a nucleic acid molecule that is differentially expressed in at 

least one leukemia risk group selected from the group consisting of T- ALL, E2A- 
PBX1 , TEL-AML1 , BCR-ABL, MLL, Hyperdiploid >50, and Novel; 

b) a nucleic acid molecule that is differentially expressed in 
subjects affected by leukemia who will relapse after conventional therapy; and 

15 c) a nucleic acid molecule that is differentially expressed in 

subjects affected by leukemia who will develop secondary AML after conventional 
therapy. 

37. The array of claim 36, wherein each nucleic acid molecule that is 

20 differentially expressed in at least one leukemia risk group is selected from the group 
consisting of the genes shown in Tables 2-36, 63-68, and 70-74. 

38. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in subjects affected by leukemia who will relapse after 

25 conventional therapy is selected from the group consisting of the genes shown in 
Tables 44-48. 

39. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in subjects affected by leukemia who will develop secondary 

30 AML after conventional therapy is selected from the group consisting of the genes 
shown in Table 52. 
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addresses. 

4L The array of claim 40, wherein the substrate has greater than 40 
5 addresses. 

42. The array of claim 41, wherein the substrate has greater than 68 
addresses. 

10 43 . The array of claim 36, wherein the substrate has no more than 500 

addresses. 

44. A kit for assigning a subject affected by ALL to a leukemia risk group, 
said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 
nucleic acid molecule that is differentially expressed in at least one leukemia risk 
group selected from the group consisting of T- ALL, E2A-PBX1, TEL-AML1, BCR- 
ABL, MLL, Hyperdiploid >50, and Novel; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 
the array. 

25 45. A kit for assigning a subject affected by ALL to a leukemia risk group, 

said kit comprising: 

a) an array according to claim 37; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 

30 values, each value representing the expression of a nucleic acid molecule detected by 
the array. 
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46. A kit for predicting whether a subject affected by leukemia has an 
increased risk of relapse, said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 

5 nucleic acid molecule that is differentially expressed in subjects affected by leukemia 
who will relapse following conventional therapy; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 

10 the array. 

47. A kit for predicting whether a subject affected by leukemia has an 
increased risk of relapse, said kit comprising: 

a) an array accrding to claim 38; and 
15 b) a computer-readable medium having a plurality of digitally- 

encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

20 48. A kit for predicting whether a subject affected by TEL-AML1 has an 

increased risk of relapse, said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 
nucleic acid molecule that is differentially expressed in subjects affected by TEL- 

25 AML1 who will relapse after conventional therapy; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 
the array. 



30 



49. A kit for predicting whether a subject affected by TEL-AML1 has an 
increased risk of relapse, said kit comprising: 

a) an array according to claim 39; and 
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b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

5 

50. A kit to aid in choosing therapy for a subject affected by leukemia, said 
kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 

10 nucleic acid molecule that is differentially expressed in at least one leukemia risk 

group selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR- 
ABL, MLL, Hyperdiploid >50, and Novel; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 

1 5 values, each value representing the expression of a nucleic acid molecule detected by 
the array. 

51. A kit to aid in choosing therapy for a subject affected by leukemia, said 
kit comprising: 

20 a) an array according to claim 37; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

25 

52. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a gene that is differentially 
expressed in at least one leukemia risk group selected from the group consisting of T- 

30 ALL, E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel. 



5 3 . The computer readable medium of claim 52, wherein the expression 
profiles comprise values selected from the group consisting of: 
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a) values representing the expression levels of at least 7 genes 
selected from the genes show in Tables 2-8, 16-36, 54-60, and 63-68; 

b) a value representing the expression level of the gene shown in 

Table 10; 

5 c) a value representing the expression level of the gene shown in 

Table 14; 

d) values representing the expression levels of the genes shown in 
Tables 9, 11, 12, 13, and 15; and 

e) values representing the expression level of at least one gene 
10 showin in Tables 70, 71, 72, 73, and 74. 



54. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a gene that is differentially 

15 expressed in subjects affected by leukemia who will relapse following conventional 
therapy. 

55. The computer readable medium of claim 54, wherein the expression 
profiles comprise values selected from the group consisting of: 

20 a) values representing the expression levels at least 8 genes 

selected from the genes show in Table 44. 

b) values representing the expression levels of at least 5 genes 
selected from the genes shown in Table 45; 

c) values representing the expression levels of at least 3 genes 
25 selected from the genes shown in Table 46; 

d) values representing the expression levels of at least 5 genes 
selected from the genes shown in Table 47; and 

e) values representing the expression levels of at least 4 genes 
selected from the genes shown in Table 48. 

30 

56. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
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values, each value representing the expression of a gene that is differentially 
expressed in subjects affected by leukemia who will develop secondary AML. 

57. The computer readable medium of claim 56, wherein the expression 

5 profiles comprise values selected from values representing the expression levels of at 
least 7 genes selected from the genes show in Table 52. 

58. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the T-ALL risk group comprise values 

10 selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 7; 

b) a value representing the expression level of the gene shown in 

Table 14; 

15 c) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 21; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 28; 

e) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 35; and 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 59. 

59. The method of claim 1 wherein the subject expression profile and the 
25 reference expression profile associated with the E2A-PBX1 risk group comprise 

values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 3; 

b) a value representing the expression level of the gene shown in 

30 Table 10; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 17; 
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d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 24; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 31; 

5 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 64; and 

h) values representing the expression levels of at least one of the 
1 0 genes shown in Table 7 1 . 

60. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the TEL-AML1 risk group comprise 
values selected from the group consisting of: 

15 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 8; 

b) values representing the expression levels of the genes shown in 

Table 15; 

c) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 22; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 29; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 36; and 

25 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55. 

61. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the BCR-ABL risk group comprise 

30 values selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 2; 
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b) values representiBg the expression levels of the genes shown in 

Table 9; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 16; 

5 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 23; 

e) values representing the expression levels of at least 20 gene 
selected from the genes shown in Table 30; and 

f) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 54. 

62. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the MLL risk group comprise values 
selected from the group consisting of: 

15 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 5; 

b) values representing the expression levels of the genes shown in 

Table 12; 

c) values representing the expression level of at least 20 genes 
20 selected from the genes shown in Table 19; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 26; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 33; and 

25 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 57. 

63. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Hyperdiploid >50 risk group 

30 comprise values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 4; 
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b) values representing the expression levels of the genes shown in 

Table 11; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 18; 

5 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 25; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 32; and 

f) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 56. 

64. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in at least one leukemia risk group is selected from the group 
consisting of the genes shown in Tables 2-36. 

15 



-199- 



BNSDOCID; <WO 030831 40A2_I_> 



r 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PC I ) 



(19) World Intellectual Property 
Organization 

International Bureau 

(43) International Publication Date 
9 October 2003 (09.10.2003) 




PCT 



(10) International Publication Number 

WO 2003/083140 A3 



(51) International Patent Classification 7 : C12Q 1/68, 

C12N 15/1 1 

(21) International Application Number: 

PCT/US2003/00S486 



(22) International Filing Date: 19 March 2003 (19.03.2003) 



(25) Filing Language: 



(26) Publication Language: 



English 



English 



(74) Agent: COULTER, Kathryn, L. Alston & Bird; Bank of 
America Plaza, Suile 4000, 101 South Tryon Street, Char- 
lotte, NC 28280-4000 (US'). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CM, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, FI, GB, GD, GE, GH, 
GM, MR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, IX, 
LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NI, NO, NZ, OM, PH, PL, PT, RO, RU, SC, SD, 
SE, SG, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, US, 
UZ, VN, YU, ZA, ZM, ZW. 



(30) Priority Data: 

60/367,144 



22 March 2002 (22.03.2002) US 



(71) Applicant (for all designated States except US): ST.JUDE 
CHILDREN'S RESEARCH HOSPITAL, INC. 

[US/US]; 332 N. Lauderdale Street, Memphis, TN 
38105-2794 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (fot US only): DOWNING, James, 

R. [US/US]; 7650 Chapel Ridge Drive, Cordova, TN 
38106 (US). YEOH, Eng-Juh [MY/SG]; 5 Lower Kent 
Ridge Road, Singapore 119074, Republic of Singapore 
(SG). WILKINS, Dawn, E. [US/US]; 3321 Whippoor- 
will Lane, Oxford, MS 38655 (US). WONG, Li in soon 
[SG/SG], 6B Balmeg Hill #02-01, Singapore 1 19908, 
Republic of Singapore (SG). 



(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG, CM, CY, CZ, DE, DK, EE, 
ES, FI, FR, GB, GR, HU, IE, TT, LU, MC, NL, PT, RO, 
SE, SI, SK, TR), OAPI patent (BE, BJ, CF, CG, CI, CM, 
GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG). 

Published: 

— with international search report 

(88) Date of publication of the international search report: 

26 February 2004 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



< 



oo 
en 

o 



(54) Title: CLASSIFICATION AND PROGNOSIS PREDICTION OF ACUTE LYMPHOBLASSTIC LEUKEMIA BY GENE EX- 
PRESSION PROFILING 

(57) Abstract: The present invention provides methods and compositions useful for diagnosing and choosing treatment for leukemia 
patients. The claimed methods include methods of assigning a subject affected by leukemia to a leukemia risk group, methods of 
predicting whether a subject affected by leukemia has an increased risk of relapse, methods of predicting whether a subject affected by 
leukemia has an increased risk of developing secondary acute myeloid leukemia, methods to aid in the determination of a prognosis 
for a subject affected by leukemia, methods of choosing a therapy for a subject affected by leukemia, and methods of monitoring the 
disease state in a subject undergoing one or more therapies for leukemia. The claimed compositions include arrays having capture 
probes for the differentially-expressed genes of the invention, computer readable media having digitally-encoded expression profiles 
associated with leukemia risk groups, and kits for diagnosing and choosing therapy for leukemia patients. 



BNSDOCID: <WO 030831 40A3_I_> 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US03/08486 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC(7) C12Q 1/68; CI2N 15/1 1 

US CL 435/6; 536/24.3 

According to International Patent Classification (IPC) or to both national classification and IPC 



FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 435/6; 536/24.3 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the internationai search (name of data base and, where practicable, search terms used) 
Please See Continuation Sheet 



DOCUMENTS CONSIDERED TO BE RELEVANT 



JNMDKKKD TU KK KKLKVAiN 1 

of document, with indication, where appropriate, of the relevant passages 
.1 A2 (YEDA RESEARCH AND DEVELOPMENT CO. LTD) 13 Sen tern! 



Category 



Citation 



Relevant to claim No. 



X 
A,P 
X 



WO 01/67061 A2 (YEDA RESEARCH AND DEVELOPMENT CO. LTD) 13 September 
2001 (13.09.2001), pages 20-23. 

US 2002/01 1 1742 Al (ROCKE et al.) 15 August 2002 (15.08.2002), pages 2, 8-10, 15, 
16. 

GOLUB et al. Molecular Classification of Cancer: Class Discovery and Class Prediction 
by Gene Expression Monitoring. Science. 15 October 1999, Vol. 286. pages 531-537, 
especially page 531. 

Database BIOSIS on STN, AN 2002: 152016, FILLMORE et al. 'Gene expression 
profiling of T-cell lymphoma cell lines'. Blood. 16 November 2001, Vol. 98, No. 1, page 
158b, Abstract. 

Database BIOSIS on STN, AN 2002:250205, FERRANDO et al. 'Prognostic 
classification of pediatric T-ALL using oligonucleotide microarrays'. Blood. 16 
November 2001 , Vol. 98, No. 1 1 , pages 759a-760a, Abstract. 



1, 9-13, 36, 40-44, 46, 
50 

1, 9-13, 36, 40-44, 46, 
50 

1,9-13, 36, 40-44, 46, 
50 



1, 9-13, 36, 40-44, 46, 
50 



1, 9-13, 36, 40-44, 46, 
50 



El 



Further documents are listed in the continuation of Box C. 



□ 



See patent family annex. 



* Special categories of cited documents: 

"A" document defining the general state of the art which is not considered to be 
of particular relevance 

*E" earlier application or patent published on or after the international filing date 

"L" document which may throw doubts on priority claim(s) or which is cited to 
establish the publication date of another citation or other special reason (as 
specified) 

"O" document referring to an oral disclosure, use. exhibition or other means 

"P" document published prior to the international Tiling date but later than the 
priority date claimed 



"X" 



later document published after the international filing date or priority 
dale and not in conflict with the application but cited to understand the 
principle or theory underlying the invention 

document of particular relevance; the claimed invention cannot be 
considered novel or cannot be considered to involve an inventive step 
when the document is taken alone 

document of particular relevance; the claimed invention cannot be 
considered to involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious to a person skilled in the art 

document member of the same patent family 




Date of the actual completion of the international search 
22 August 2003 (22.08.2003) 



of mailing of the international s< 



ort 



Name and mailing address of the ISA /US 
Mail StopPCT. Attn: ISA/US 
Commissioner for Patents 
P.O. Box 1450 

Alexandria. Virginia 22313-1450 
Facsimile No. (703)305-3230 



hone No. 703 308-0196 



Form PCT/ISA/2 10 (second sheet) (July 1998) 



BNSDOCID: <WO 03083 140A3J_> 



PCT/US03/08486 



INTERNATIONAL SEARCH REPORT 



C. (Continuation) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category ♦ 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


X 


Database BIOSIS on STN, AN 2001:312132. FERRANDO et al. 'Quantitative analysis of 
oncogenic transcription factors in T-cell acute lymphoblastic leukemia'. Blood. 16 
November 2000, Vol. 96, No. 11, page 696a, Abstract. 


1, 9-13, 36, 40-44, 
46, 50 



Form PCT/ISA/210 (second sheet) (July 1998) 



<WO 03083 140A3_I_> 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US03/08486 



Box I Observations where certain claims were found unsearchable (Continuation of Item 1 of first sheet) 

This international report has not been established in respect of certain claims under Article I7(2)(a) for the following reasons: 

1. ^ Claim Nos.: 52-57 

because they relate to subject matter not required to be searched by this Authority, namely: 
Claims 52-57 are drawn to a mere presentation of data. 



2. ^ ClaimNos.: 2-8,15-20,24-29,31 ,37-39,45,47,49,51 and 58-64 

because they relate to parts of the international application that do not comply with the prescribed requirements to 
such an extent that no meaningful international search can be carried out, specifically: 
Please See Continuation Sheet 



3. [2SJ Claim Nos.: 15, 24 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 
6.4(a). 

Box D Observations where unity of invention is lacking (Continuation of Item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 
Please See Continuation Sheet 



1 i As all required additional search fees were timely paid by the applicant, this international search report covers all 
searchable claims. 

2. I 1 As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite 
payment of any additional fee. 

3. 1 | As only some of the required additional search fees were timely paid by the applicant, this international search 
report covers only those claims for which fees were paid, specifically claims Nos.: 



4. ^ No required additional search fees were timely paid by the applicant. Consequently, this international search report 
is restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 1, 9-13, 36, 40-44, 46, 50, 
and the T-ALL risk group 

Remark on Protest | | The additional search fees were accompanied by the applicant's protest. 

[ | No protest accompanied the payment of additional search fees. 



Form PCT/ISA/210 (continuation of first sheet(l)) (July 1998) 



INTERNATIONAL SEARCH REPORT 



PCT/US03/08486 



Continuation of Box I Reason 2: 

Claims 2-8, 15-20, 24-29, 3 1 , 37-39, 45, 47, 49, 51 , and 58-64 are not searchable because they are drawn to subject matter 
comprising sequences that are improperly incorporated by reference because the claimed sequences are not described in the 
description at the time of filing, and the sequences referenced by database accession numbers in the tables discussed in the claims 
could be modified by the database authors subsequent to the international filing date. 

BOX D. OBSERVATIONS WHERE UNITY OF INVENTION IS LACKING 

It is noted that claims 2-8, 16-20, 25-29, 31, 37-39, 45, 47, 49, 51, and 58-64 are not searchable because they are drawn to subject 
matter sequences that are improperly incorporated by reference because the claimed sequences are not described in the description at 
the time of filing, and the sequences referenced by database accession numbers in the tables discussed in the claims could be modified 
by the database authors subsequent to the international filing date. It is further noted that claims 15 and 24 are not searchable because 
they are improper multiple dependent claims, and claims 52-57 are not searchable because they are directed to data on computer 
readable media which is not patentable subject matter. 

This application contains the following inventions or groups of inventions which are not so linked as to form a single general 
inventive concept under PCT Rule 13.1. In order for all inventions to be examined, the appropriate additional examination fees must 
be paid. 

Group I, claim(s) 1 , 9- 13, 36, 40-44, 46, 48, and 50 drawn to a method of assigning a leukemia patient expression profile to a risk 
group and apparatus for performing the method (1* method and 1" apparatus). 

Group II, claim(s) 14, drawn to a method of detennining prognosis of leukemia relapse (2 nd method). 

Group III, claim(s) 21, drawn to a method of determining prognosis of secondary A ML in a subject affected by TEL-AMLl (3 rd 
method) . 

Group IV, claim(s) 22, drawn to a method of choosing a therapy for a subject affected by leukemia by comparing expression profiles 
of the subject to expression profiles of subjects in different risk groups (4 th method). 

Group V, claim(s) 23, drawn to a method of choosing a therapy for a subject affected by leukemia by comparing expression profiles 
of the subject to expression profiles of subjects who will relapse (5 th method) . 

Group VI, claim(s) 30, drawn to a method of choosing a therapy for a subject affected by TEL-AMLl by comparing expression 
profiles of me subject 10 expression profiles of subjects who will develop secondary AML (6 th method). 

Group VII, claim(s) 32, drawn to a method of determining the prognosis of a subject affected by leukemia by comparing expression 
profiles of the subject to expression profiles of subjects in different risk groups (7 th method). 

Group VIII, claim(s) 33, drawn to a method of determining the prognosis of a subject affected by leukemia by assigning the subject to 
a risk group and then comparing expression profiles of the subject to expression profiles of subjects in the same risk group who have 
relapsed (8* method) . 

Group IX, claim(s) 34, drawn to a method of determining the prognosis of a TEL-AMLl subject by comparing expression profiles of 
the subject to expression profiles of subjects affected by TEL-AMLl (9 th method). 

Group X, claim(s) 35, drawn to a method of assigning a subject affected by ALL to an ALL risk group by comparing expression 
profiles of the subject to expression profiles of the subject to expression profiles to subjects in different risk groups (lCr* method). 

This application contains claims directed to more than one species of the generic invention. These species are deemed to lack unity of 
invention because they arc not so linked as to form a single general inventive concept under PCT Rule 13. 1. 

In order for more than one species u> be examined, the appropriate additional examination fees must be paid. The species are as 
follows: 
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The seven risk group species are 1)T-ALL, 2) E2A-PBX1 . 3) TEL- Ami 1, 4) BCR-ABL, 5) MLL, 6) Hyperdiploid>50, and 7) 
Novel . 

The claims are deemed to correspond to the species listed above in the following manner: 

Claims 1 , 4043, and 50 of group 1 and claims 14, 21 , 22, 23. 30, 32, 33, 34, and 35 of Groups II-X are Markush-type claims. 
Claims 9- 13 of Group I are drawn to the ALL species. Claim 48 of Group 1 is drawn to the TEL-AML1 species. 

The following claim(s) are generic: 44 and 46 of Group I. 

The inventions listed as Groups I-X do not relate to a single general inventive concept under PCT Rule 13.1 because, under PCT Rule 
13.2, they lack the same or corresponding special technical features for the following reasons: PCT Rule 13. 1 and Annex B do not 
provide for unity of invention between two or more different products, methods of making, methods of use, or apparatus that share a 
special technical feature. Each Group is drawn to a different method with different steps and produces different results. 

The species listed above do not relate to a single general inventive concept under PCT Rule 13. 1 because, under PCT Rule 13.2, the 
species lack the same or corresponding special technical features for the following reasons: each species is drawn to a mutually 
exclusive different disease risk group. 
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